Even with aggressive demand management, there will inevitably be cycles of aggregate infrastructure demand that increase and decrease by time of day and day of the week. Clever infrastructure management enables the infrastructure service provider to power off excess equipment as aggregate demand decreases, and then power equipment back on as aggregate demand increases. We shall call that process of intelligently powering infrastructure equipment on and off to track with aggregate demand infrastructure commitment, and shall apply unit commitment style solutions from the electric power industry. Infrastructure commitment is considered in the following sections:
Lean infrastructure commitment can be applied in parallel with more efficient compute, networking, storage, power, and cooling equipment. By analogy, replacing incandescent light bulbs with more efficient compact fluorescent or LED lights does not eliminate the benefit of turning off lights when they are not needed.
As discussed in Section 5.12: Balance and Grid Operations, scheduling when cloud infrastructure equipment is powered on and off to meet demand with minimal consumption of electric power is very similar to the electric power industry's unit commitment problem. Given the amount of money spent on fuel for all of the thermal generating plants across the planet, the unit commitment problem is very well studied and a range of solutions are known. Today power grid operators routinely solve this problem mathematically at-scale, every few minutes. Insights from solutions to the unit commitment problem are helpful when considering the infrastructure commitment problem:
Resource pooling and other factors make more aggressive startup and shutdown management of infrastructure feasible with cloud. Powering off equipment during low-usage periods enables the infrastructure service provider to reduce power consumed directly by the target equipment, as well as to reduce power consumed by cooling equipment to exhaust waste heat (which is not generated when the equipment is powered off). That reduced power consumption often translates to a reduced carbon footprint (see Section 3.3.15: Carbon Footprint).
The unit commitment mathematical models and software systems are used by ISOs and RTOs to drive day-ahead operating schedules and real-time dispatch orders often consider the following inputs:
Figure 9.1 Sample Input–Output Curve (copy of Figure 5.7)
The temporal parameters of unit commitment are easily understood via Figure 9.2. After a generator startup cycle completes and the generating unit is operating at its economic minimum power output it is released for economic dispatch so that economic dispatch processes can adjust the unit's power output (between the unit's economic minimum and economic maximum output levels, and subject to the unit's ramp rate). Assume that after the unit's minimum run time it will be shutdown, so the economic dispatch process ramps the unit's power down to economic minimum and then releases the unit for shutdown. After executing the shutdown procedure the unit is offline. Sometime later a unit startup order is issued; it takes the notification time for the generator to connect to the power grid at 0 MW of power output. The unit ramps up from 0 MW to the economic minimum power output in the startup time. When the unit reaches economic minimum power output, the unit is again released for economic dispatch. Minimum downtime is the minimum time between when a unit is released for shutdown and when the unit is back online at economic minimum output and ready for economic dispatch.
Figure 9.2 Temporal Parameters for Unit Commitment
A commitment plan gives a schedule for starting up and shutting down all power generating equipment. For any arbitrary commitment plan one can estimate both (1) the maximum amount of generating capacity that will be on line at any point in time and (2) the total costs of implementing that plan. By applying appropriate mathematical techniques (e.g., integer linear programming, Lagrangian relaxation) the power industry picks solutions that “maximize social welfare.” A day-ahead plan is constructed as a foundation, and then every 5 minutes (nominally real time in the power industry) that plan is rechecked to fine tune it. Variable and unpredictable patterns of power generation by renewable supplies like solar and wind makes unit commitment and economic dispatch activities more complicated for modern electric grid operators.
The infrastructure commitment problem is somewhat simpler than the unit commitment problem, except for the service drainage problem. Figure 9.3 frames the temporal infrastructure commitment for a single infrastructure element (e.g., a physical server that can be explicitly started up and shut down) with the same graphic as the temporal unit commitment model of Figure 9.2. The Y-axis of Figure 9.3 gives the rough online infrastructure capacity against the X-axis of time. Figure 9.3 explicitly shows power consumed by “infrastructure overhead” as a nominally fixed level during stable operation, and power consumption to serve application loads as variable above that nominally fixed infrastructure overhead. The resources and power consumed by infrastructure overheads will actually vary over time based on infrastructure provisioning and other activities, but this simplification does not materially impact this discussion.
Figure 9.3 Temporal Parameters for Infrastructure Commitment
Let us consider Figure 9.3 in detail across the time window shown:
The infrastructure commitment problem is thus one of constructing an operational plan that schedules issuing startup and shutdown orders to every controllable element in a cloud data center. As with the power industry's unit commitment problem, the infrastructure commitment problem is fundamentally about picking an infrastructure commitment plan (i.e., schedule of startup and shutdown for all elements in a data center) to assure that sufficient infrastructure capacity is continuously online to reliably serve application workloads at the lowest variable cost. A rough estimate of the available online capacity for an infrastructure commitment plan is the sum of the normal rated capacity of all element instances that are online and nominally available for virtual resource managers to assign new workloads to (i.e., a drain action is not pending on). Both startup and shutdown times for the infrastructure elements are also easy to estimate. The difficulties lie in selecting infrastructure elements that can be drained at the lowest overall cost when aggregate demand decreases. Intelligent placement of virtual resource allocations onto physical elements can simplify this infrastructure drainage problem.
As with the power industry, infrastructure service providers will likely create day-ahead infrastructure commitment plans of when to startup and shutdown each infrastructure element. The infrastructure service provider then re-evaluates the day's infrastructure commitment plan every 5 minutes or so based on actual demand, operational status of infrastructure equipment, and other factors to determine the least costly commitment actions to assure that online capacity tracks with demand, and explicitly issues startup and shutdown orders for target elements.
Figure 9.4 highlights the startup time of a single infrastructure element in the general temporal context of Figure 9.3. Element startup time varies by equipment type and configuration. This section methodically considers the factors that drive element startup time.
Figure 9.4 Context of Element Startup Time
Element startup begins when the infrastructure service provider's infrastructure commitment operations support system issues an element startup order to the infrastructure management system that controls power for the target element. The startup time includes the following activities:
Figure 9.5 visualizes the element startup timeline. Note that while Figure 9.3 and Figure 9.4 showed a smooth linear growth in power consumption from 0 W to a higher level when the startup period ends, in reality the element's power consumption is likely to surge as the element is energized, as power on diagnostics access components, as image files are retrieved from persistent storage and executed, and as the element synchronizes with other systems.
Figure 9.5 Element Startup Time
All power consumed between element startup time begin and startup time end, including power consumed from startup of ancillary equipment until the first element served by that equipment is ready to service allocation requests from the virtual resource management system, contributes to the element's startup cost.
Figure 9.6 highlights element shutdown in the context of Figure 9.3. At the highest level, element shutdown time is a drain time period when virtual resources are removed from the target element followed by orderly hardware shutdown.
Figure 9.6 Context of Element Shutdown Time
Figure 9.7 visualizes a timeline of the element shutdown process. Before an element shutdown order is issued to a specific element, the infrastructure commitment system must select a specific target to drain; this selection is necessarily completed before the shutdown order can be issued for that target system. Once that shutdown order is issued, the virtual resource manager will embargo the target system so that no further resource allocations will be placed onto that system. The virtual resource manager will then begin methodically draining all application resources currently placed onto the target element. The virtual resource manager has several options to drain each virtual resource:
Figure 9.7 Element Shutdown Time
Note that careful planning can potentially shorten and simplify infrastructure drainage actions. With some application knowledge the virtual resource manager could intelligently place particular services onto particular resources to ease/minimize the draining. For example, by keeping track of the average lifetime of particular applications' virtual machines (VMs), one could map short-lived services all onto the same pool of resources, and map long-lived services onto a different pool of resources. This reduces the need of having to migrate or terminate many long-lived services.
Note further that while anti-affinity rules (e.g., do not place both application components A and B into the same infrastructure component failure group) are relatively straightforward to continuously enforce when draining or migrating application components, application affinity rules (e.g., do place application components B and C onto the same infrastructure element) are more problematic because they significantly constrain the choice of destination elements that can accept both B and C simultaneously as well as imply that the migration event must be carefully coordinated to minimize the time that affinity rules are violated. While it is often straightforward to implement affinity rules when allocating resources, it is far more challenging to continuously enforce them when resources are migrated from one infrastructure element to another, or when replacing (a.k.a., repairing) a failed application component instance. Thus, treating affinity rules as advisory rather than compulsory, and thus waiving them when migrating components in the context of lean infrastructure management vastly simplifies lean infrastructure management.
Each of those drain actions has a different completion time, a different impact on user service, and a different burden on the application service provider. Infrastructure service provider policy will likely consider application knowledge when determining which drain action is executed on which resource. Policy will also dictate how long the infrastructure waits for the resource to gracefully drain before taking a more aggressive action, up to disorderly termination, before releasing the element for hardware shutdown.
While the time to complete an orderly hardware shutdown of the target element is likely to be fairly consistent for a particular element configuration, drain times can vary dramatically based on the specific resource instances that are assigned to the target element at the instant that the shutdown order is issued. In particular, drain time is governed by the time it takes for every assigned resource to complete an orderly migration or release, or for infrastructure service provider policy to permit orderly or disorderly termination. Drain time can be estimated as the maximum of the expected drain time for all of the resources placed on the target element at any moment in time. The estimated drain time is driven by the infrastructure service provider's preferred primary, secondary, and tertiary (if used) drain actions for each particular resource, and the timeout period for each action. The selected drain actions are likely to be driven based on the grade of service configured by the application service provider. For example, Table 9.1 offers sample primary, secondary, and tertiary resource drainage policies for each of the four resource grades of service discussed in Section 7.3.3: Create Attractive Infrastructure Pricing Models.
Table 9.1 Sample Infrastructure Resource Drainage Policy
Resource Grade of Service | Primary Drain Action | Secondary Drain Action | Tertiary Drain Action |
Strict real time – minimal resource scheduling latency and no resource curtailment | Wait for up to X minutes to drain naturally | Request drainage and wait for up to Y minutes | Migrate |
Real time – small resource scheduling latency with occasional and modest resource curtailment is acceptable | Request drainage and wait for up to Y minutes | Migrate | Orderly termination if migration was unsuccessful |
Normal – occasional live migration and modest resource scheduling latency and resource curtailment is acceptable | Migrate | Orderly termination if migration was unsuccessful | Disorderly termination if orderly termination was unsuccessful |
Economy – workload can be curtailed, suspended, and resumed at infrastructure service provider's discretion | Migrate | Orderly termination if migration was unsuccessful | Disorderly termination if orderly termination was unsuccessful |
One can estimate the completion time for each drain action for each resource instance based on throughput and workload on the various infrastructure elements. For instance, one can estimate the time to migrate a resource between two elements based on the size of the resource, the throughput of source, destination, networking, and any other elements involved in the migration action, and other factors. Throughput limitations may preclude the infrastructure service provider from executing all migration actions simultaneously, but the aggregate time for successful migrations of a particular resource load can be estimated. If the migration failure rate is non-negligible, then drain time estimates can include time for a likely number of orderly termination events, and perhaps even disorderly termination events. Likewise the completion time for an orderly or disorderly resource termination can be estimated. Wait and request drain actions are inherently less predictable, so infrastructure service providers may place resources in grades of service that do not gracefully support resource migration onto special servers that are designated as always-on.
As shown in Figure 9.7, if the target element was the last element to be powered off in a chassis, rack, or row of equipment, then it may be appropriate to power off ancillary systems like top-of-rack Ethernet switches, power distribution units, and cooling mechanisms. Infrastructure resources consumed for migration, orderly termination and disorderly termination processes themselves should be charged as drain costs, as well as the target element's infrastructure resources consumed after drain actions begin until the moment that the target element is powered off.
Lean infrastructure commitment is best understood in the context of an example. Imagine the hypothetical small scale cloud1 deployment configuration of Figure 9.8 with four racks of infrastructure equipment A, B, C, and D; assume all equipment in each of the four racks can be independently powered on and off. This example considers VM instances; lean management of persistent virtual storage is beyond the scope of this work. Racks A and B are designated as always-on capacity meaning that they will be not regularly be powered off, while Racks C and D are designated as peaking capacity which will be powered on and off on a daily basis to track with demand.
Figure 9.8 Sample Small Cloud
An infrastructure commitment operations support system focuses on scheduling and coordinating the powering up (i.e., committing) equipment in Racks C and D in the morning to track with increasing demand, and powering down (i.e., decommitting) equipment in those racks in the evening as demand rolls off. Since Racks A and B are always-on, the virtual resource manager will place resources that cannot tolerate migration onto those racks, and they will host migratable resources during low usage periods when peaking equipment is decommitted.
Figure 9.9 visualizes the highest level commitment schedule for the sample small cloud of Figure 9.8 for a typical day:
Figure 9.9 Hypothetical Commitment Schedule
Note that the “always-on” equipment will occasional need to be taken offline for maintenance actions like installing security patches. Infrastructure service providers may take advantage of those events to rotate which equipment is “always-on” to level the wear and tear on their infrastructure equipment.
Assume that the infrastructure commitment system executes a decision and planning cycle every 5 minutes 24 hours a day; a summary of key actions across those 5-minute cycles across a hypothetical day might be as follows:
5:20 am – the day's demand growth is accelerating and the infrastructure commitment system forecasts insufficient capacity to meet the infrastructure service provider's policy objectives at 5:30 am, so capacity commitment orders are issued for a few more servers in Rack C.
Meanwhile, the virtual resource manager notices the workload of resources on servers in Racks A and B increases, so the virtual resource manager begins moving migratable workloads off of Racks A and B to underutilized infrastructure capacity in Rack C, and later Rack D. Migratable workloads that resided overnight on always-on servers in Racks A and B can be migrated into peaking capacity on Racks C and D to assure that sufficient capacity to serve the non-migratable workloads that remain hosted by always-on servers in Racks A and B 24/7.
As migration events consume infrastructure resources and inconvenience the impacted application service provider, infrastructure service providers will set policies with some hysteresis that balances power savings against overhead costs and application service provider impact. Setting a policy of not normally migrating any virtual resource more than twice a day (e.g., hosted overnight/off-peak on always-on equipment and hosted in peak periods on peaking equipment) may be a reasonable balance of controllability for the infrastructure service provider and minimal inconvenience for the application service provider. Compelling discounts or policies can entice applications to willingly support occasional and impromptu (i.e., without prior notification) resource migration by the infrastructure service provider will vastly simplify aggressive infrastructure commitment operations. Beyond pricing policies attractive enough for application service providers to opt-in to voluntarily support migration (and perhaps other drainage actions), infrastructure service providers must assure that the migration actions:
Note that since servers in always-on racks A and B will have more simultaneous – and presumably lightly used – VM instances than the peaking elements in Racks C and D, the servers in Racks A and B might be configured differently from the servers in Racks C and D, such as being equipped with more RAM to efficiently host a larger number of lightly used VM instances in off-peak periods than peaking servers are likely to experience.