As shown in the hype cycle (Figure 5.1), production of electric power via thermal power plants is very mature while production of virtual compute, memory, storage, and networking resources to serve applications via cloud infrastructure is still rapidly evolving. Fortunately, there are fundamental similarities between these two apparently different businesses that afford useful insights into how the infrastructure service provider operational practices are likely to mature.
Table 5.1 frames (and Figure 5.2 visualizes) the highest level similarities between production of electricity via thermal power plants for utilities and production of virtual compute, memory, storage, and network resources by cloud infrastructure service providers for application service providers. Power generating organizations use coal or other fuel as input to boiler/turbine/generator systems while infrastructure service providers use electricity as input to cloud data centers packed with servers, storage devices, and network gear. Others in the ICT have offered a similar pre-cloud analogy as “utility computing” so this analogy is not novel.
Table 5.1 Analogy between Electricity Generation and Cloud Infrastructure Businesses
Attribute | Electric Power | Cloud Infrastructure |
Target organization | Power generating organization | Infrastructure service provider of public or private cloud |
Target organization's customer | Load-serving entity (e.g., utility) | Application service provider (ASP), which will be in the same larger organization as infrastructure service provider for private clouds |
Target's customers' customer | Residential, commercial, and industrial power users | End users of applications |
Target organization's product | Bulk/wholesale electrical power for target customer to retail | Virtualized compute, memory, storage, and networking resources to host applications |
Location of production | Power station | Data center |
Means of production | Thermal generating equipment | Commodity servers, storage and networking gear, and enabling software |
Input to production | Fuel (coal, natural gas, petroleum, etc.) | Electricity |
The following sections consider parallels and useful insights between electric power generation via thermal generating systems and cloud computing infrastructure:
The following sections consider several insightful similarities between electricity market and grid operations and cloud infrastructure:
For consistency, this chapter primarily uses the North American Electric Reliability Corporation's continent wide terminology and definitions from “Glossary of Terms Used in NERC Reliability Standards” (NERC, 2015).
Figure 5.3 highlights the essential points of commercial electric power systems:
A primary concern of load-serving entities is economic dispatch, which is broadly defined as “operation of generation facilities to produce energy at the lowest cost to reliably serve consumers, recognizing any operational limits of generation and transmission facilities.”1 Note the similarity between the power industry's goal of economic dispatch and the lean cloud computing goal from Chapter 3: Lean Thinking on Cloud Capacity Management; sustainably achieve the shortest lead time, best quality and value, highest customer delight at the lowest cost.
As shown in Figure 5.4, aggregate demand for electric power varies with time of day as well as with day of week and exhibits seasonal patterns of demand. Note that demand for a particular application service – and thus the demand for infrastructure resources hosting that application – can vary far more dramatically from off-peak to peak periods than aggregate electric power demand varies. Larger demand swings for individual applications, coupled with materially less fungible resources, makes cloud capacity management more complicated than management of generating capacity for a modern power grid.
Technical, business, regulatory, and market factors, as well as demand variations result materially different marginal power costs, as shown in Figure 5.5. Careful planning and economic dispatch are important to controlling a utility's total cost for electric power. As has been seen in California's electricity market, free market pricing for electricity creates opportunities for2Source: both3Source: significant profits during peak usage and capacity emergency periods, and opportunities for mischief manipulating those capacity emergencies. It is unclear if cloud pricing will ever evolve capacity dispatch (a.k.a., variable operating cost) curves as dramatic as shown in Figure 5.5.
It is straightforward to objectively and quantitatively measure quality of electric power service by considering the following:
Conveniently, electricity service quality can be probed at many points throughout the service delivery path enabling rigorous and continuous service quality monitoring. In contrast, infrastructure service quality delivered to application software instances is more challenging to measure.
Value chains are considered vertical integrated when all key activities are performed by a single organization (e.g., a large corporation) and horizontally structured when key value is purchased via commercial transactions from other companies. An electrical utility that offers retail electricity to end users may source wholesale power from either the utility's captive power generating organizations or on the open market from independent power generators. In the ICT industry, an application service provider can typically source cloud infrastructure capacity from either public cloud or private cloud service providers.
Public cloud services will be offered to application service providers at market prices. Pricing of private cloud resources for application service providers is determined by the organization's business models, but those costs are likely to be related to the infrastructure service provider organization's costs, summarized in Table 5.2.
Table 5.2 Cost Factor Summary
Cost Factors | Electric Power Producer | Cloud Infrastructure Service Provider |
Real estate | Power plant structures and infrastructure | Data center facility |
Production equipment | Thermal generating plant | Compute, storage and networking gear, and necessary software licenses |
Fuel costs | Coal, natural gas, petroleum | Electricity to power cloud infrastructure, including fixed electricity service charges |
Non-fuel variable costs | Plant startup costs
Plant shutdown costs |
Water for cooling |
Operations | Plant operations center and staff | Data center operations systems, software license fees, staff, etc. |
Maintenance | Routine and preventative maintenance of thermal plantRepairs | Hardware and software maintenance feesHardware, software, and firmware upgradesRepairs |
Environmental compliance costs | Yes | Yes |
Notable differences between electric power production and cloud infrastructure costs are given in Table 5.2:
In addition, the immaturity of the cloud infrastructure ecosystems means that pricing models, cost structures, and prices themselves may change significantly over the next few years as the market matures.
Electric power across the globe is generated by burning fuel in a boiler to produce steam which spins a turbine which drives a generator. Nominally 2% to 6% of the generator's gross power output is consumed by the pumps, fans, and auxiliary systems supporting the system itself (Wood et al., 2014). As shown in Figure 5.6, cloud infrastructure equipment is roughly analogous to thermal generating plants. Instead of capital intensive boiler/turbine/generator units, cloud infrastructure service providers have myriad servers, storage, and networking equipment installed in racks or shipping containers at a data center. Instead of coal, natural gas, or some other combustible fuel as input, cloud infrastructure equipment uses electricity as fuel. Instead of electric power as an output, infrastructure equipment serves virtualized compute, memory, storage, and networking resources to host application software. Of course, some of the gross processing throughput is consumed to serve hypervisors and other infrastructure management software, so the net available compute power is lower than the gross compute power.
The fundamental5Source: variable cost of a thermal plant is fuel consumption which is visualized with heat rate charts, such as Figure 5.7. The X-axis is generator output (e.g., megawatts) and the Y-axis is heat rate (e.g., British Thermal Units of heat applied per hour). Even at “idle” a thermal plant has a minimum fuel consumption rate (fmin) which produces some minimum power output (Pmin). As fuel consumption rate increases to some maximum rate (fmax), the power output increases to some maximum rate (Pmax).
The power industry is fortunate in that generator output can be objectively measured and quantified in Watts, the SI international standard unit of power. In contrast, the ICT industry has no single standard objective and quantitative measurement of data processing output, so the term MIPS for “million instructions per second” or the CPU's clock frequency is casually used to crudely characterize the rate of computations. The variable nature of individual “instructions” executed by different processing elements coupled with ongoing advances in ICT technology means that a single standard objective and quantitative measurement of processing power is unlikely, so objective side-by-side performance comparisons of cloud computing infrastructure elements is uncommon. Recognizing the awkwardness of direct side-by-side comparisons of processing power, ICT equipment suppliers often focus on utilization of the target element's full rated capacity, so “100% utilization” means the equipment is running at full rated capacity and 0% utilization means the equipment is nominally fully idle. Knowing this, one recognizes Figure 5.8 as a crude proxy “heat rate” chart for a sample commercial server. The X-axis gives processor utilization (a.k.a., CPU occupancy) as a proxy for useful work output and the Y-axis gives electrical power consumed as “fuel” input. Note that work output increases linearly as power/fuel input rises from fmin of 1321 watts to fmax of 4069 watts. Note that advanced power management techniques like dynamic voltage and clock frequency scaling can automatically modulate power consumption but are not generally directly controlled by human operators. Activation and deactivation of these advanced power management mechanisms might not produce the smooth, linear behavior of Figure 5.8, but they do enable power consumption to vary with processing performed.
Table 5.3 compares the unit startup process for a thermal power unit and a cloud infrastructure server node. The actual time from notification to ramp start is nominally minutes for cloud infrastructure nodes and fast starting thermal plants, with specific times varying based on the particulars of the target element. After a unit is online it begins ramping up useful work: thermal power units often take minutes to ramp up their power output and infrastructure server nodes take a short time to allocate, configure and activate virtual resources to serve applications.
Table 5.3 Unit Startup Process
Thermal Power Unit | Cloud Infrastructure Server Node |
Notification – order received to startup a unit and bring it into service | |
Fire up the boiler to make steam – the time to build a head of steam varies based on whether the unit is (thermally) cold or if the unit was held (thermally) warm at a minimum operating temperature | Apply electric power to target node and pass power on self-test |
Spin up turbines – spinning large, heavy turbines up to speed takes time | Boot operating system, hypervisor, and other infrastructure platform software |
Sync generator to grid and connect – the thermal unit's generator must be synchronized to the power grid before it can be electrically connected | Synchronize node to cloud management and orchestration systems |
Ramp Start – useful output begins ramping up | |
Ramp up power – once connected to the grid the thermal unit can ramp up power generation. Ramp rates of power production units are generally expressed in megawatts per minute | Begin allocating virtual resource capacity on target node to applications |
Both thermal units and infrastructure nodes have noninstantaneous shutdown processes. Transaction costs associated with startup and shutdown actions are well understood in the power industry, as well as minimum generator run times and minimum amount of time that a generator must stay off once turned off. As the ICT industry matures, the transaction costs associated with powering on or powering off an infrastructure node in a cloud data center will also become well understood.
Electric power is essentially a flow of electrons (called current) “pushed” with a certain electrical pressure (called voltage). Power is the product of the electrical pressure (voltage) and the volume of flow (current). Electrical loads and transmission lines have preferred ratios of voltage/pressure to current/flow, and this ratio of electrical voltage/pressure to current/flow is called impedance. To maximize power transfer, one matches the impedance of the energy source with the impedance of the load; impedance mismatches result in waste and suboptimal power transfer. Different electrical impedances can be matched via a transformer. As a practical matter electrical services are standardized across regions (e.g., 120 or 240 volts of current that alternates at 60 cycles per second in North America) so most electrical loads (e.g., appliances) are engineered to directly accept standard electric power and perform whatever impedance transformations and power factor corrections are necessary internally.
Application software consumes compute, networking, and storage resources, having the right ratio of compute, networking, and storage throughputs available to the software component results in optimal performance with minimal time wasted by the application user waiting for resources and minimal infrastructure resource capacity allocated but not used (a.k.a., wasted). Unfortunately, while an electrical transformer can efficiently convert the voltage:current ratio to whatever a particular load requires, the ratios of compute, memory, storage and networking delivered by a cloud service provider are less malleable.
Electric power is a fungible commodity; typically consumers neither know nor care which power generation plants produced the power that is consumed by their lights, appliances, air conditioners, and so on. Cloud compute, memory, storage, and networking infrastructure differs from the near perfect fungibility of generated electric power in the following ways:
An important practical difference between electric power and computing is that technologies exist like batteries, pumped storage, thermal storage, and ultracapacitors which enable surplus power to be saved as a capacity reserve for later use, thereby smoothing the demand on generating capacity, but excess computing power cannot practically be stored and used later. Another important difference is that physical compute, memory, storage, and networking equipment benefits from Moore's Law, so equipment purchased in the future will deliver significantly higher performance per dollar of investment than equipment purchased today. In contrast, boilers, turbines, and generators are based on mature technologies that no longer enjoy rapid and exponential performance improvements.
The power industry uses the term rating for a concept analogous to how the ICT industry uses capacity. Figure 5.9 visualizes three NERC (2015) standard rating types:
While the throttle on a power generator and propulsion system can sometimes be pushed beyond “100%” of rated power under emergency conditions, computer equipment does not include an intuitive throttle that be pushed beyond 100%. However, mechanisms like coercing CPU overclocking, raising operating temperature of processing components, and increasing supply voltage delivered to some or all electronic components may temporarily offer a surge of processing capacity which might be useful in disaster recovery and other extraordinary circumstances.
The power industry uses the term bottled capacity to mean “capacity that is available at the source but that cannot be delivered to the point of use because of restrictions in the transmission system” (PJM Manual 35). Insufficient transmission bandwidth, such as due to inadequately provisioned or overutilized access, backhaul, or backbone IP transport bandwidth can trap cloud computing capacity behind a bottleneck as well; thus, one can refer to such inaccessible capacity as being bottled or stranded. This paper considers capacity within a cloud data center, not bottlenecks that might exist in transport and software defined networking that interconnects those data centers and end users.
Wholesale electric power markets in many regions are very mature and load-serving entities routinely purchase power from other utilities and independent generators. In addition to the cost of generating electricity at the site of production, that power must often flow across third-party transmission facilities to reach the consuming utility's power transmission network. Flowing across those third-party facilities both wastes power via electricity transmission losses and consumes finite power transmission grid capacity as congestion. Power markets rollup these factors into the locational marginal price (LMP) which is the price charged to a consuming utility. This price includes:
Figure 5.10 gives an example of day-ahead LMP data.
Application services do not have the same physical losses and transmission constraints as electric power when service is hauled over great distances, so there may not be direct costs for transmission losses to consider; for example, configuration of networking equipment and facilities – and even software limitations – can impose practical limits. However, the farther the cloud infrastructure is from the users of the application hosted on that infrastructure (and software component instances supporting the target application), the greater the communications latency experienced by the end user. The finite speed of light means that each 600 miles or 1000 kilometers of distance adds nominally 5 millisecond of one-way transmission latency, and this incremental latency accrues for any transactions or interactions that require a request and response between two distant entities. In addition, greater distance can increase the risk of packet jitter and packet loss, which adds further service latency to timeout and retransmit the lost packet. Congestion (e.g., bufferbloat) through networking equipment and facilities can also increase packet latency, loss, and jitter.
Figure 5.11 visualizes the practical implications of locational sensitivity on cloud-based applications. As explained in Section 1.3.1: Application Service Quality, application quality of service is often quantified on a 1 (poor) to 5 (excellent) scale of mean opinion score (MOS). Transaction latency is a key factor in a user's quality of service for many applications, and the incremental latency associated with hauling request and response packets between the end user and the data center hosting the application's software components adds to the user-visible transaction latency. The farther the user is from the data center hosting the applications software components, the greater the one-way transport latency. The structure of application communications (e.g., number of one-way packets sent per user-visible transaction), application architecture, and other factors impact the particular application's service latency sensitivity to one-way transport latency. The sensitivity of the application's service quality can thus be visualized as in Figure 5.11 by considering what range of one-way packet latencies are consistent with excellent (MOS = 5), good (MOS = 4), and fair (MOS = 3) application service quality, and the distances associated with those application service qualities can then be read from the X-axis. Mature application service providers will engineer the service quality of their application by carefully balancing factors like software optimizations and processing throughput of cloud resources against the physical distance between end users and the cloud data center hosting components serving those users. Different applications have different sensitivities, different users have different latency expectations, and different business models support different economics, so the locational marginal value will naturally vary by application type, application service provider, end user expectations, competitors' service quality, and other factors. Service performance of traditional equipment will naturally shape expectations of cloud infrastructure and applications hosted on cloud infrastructure. Application service providers must weigh these factors – as well as availability of infrastructure capacity – when deciding which cloud data center(s) to deploy application capacity and which application instance should serve each user.
Generating plants and transmission facilities have finite capacity to deliver electricity to subscribers with acceptable power quality. Potentially nonlinear dispatch curves (e.g., Figure 5.5) mean that it is sometimes either commercially undesirable or technically infeasible to serve all demand. The power industry addresses this challenge, in part, with demand-side management, defined by NERC (2015) as “all activities or programs undertaken by Load-Serving Entity or its customers to influence the amount or timing of electricity they use.” Customers can designate some of their power use as interruptible load or interruptible demand which is defined by NERC (2015) as “Demand that the end-use customer makes available to its Load-Serving Entity via contract or agreement for curtailment,” where curtailment means “a reduction in the scheduled capacity or energy delivery.” A powerful technique is direct control load management, defined by NERC (2015) as “Demand-Side Management that is under the direct control of the system operator. [Direct control load management] may control the electric supply to individual appliances or equipment on customer premises. [Direct control load management] as defined here does not include Interruptible Demand.”
The parallels to infrastructure service: demand management of infrastructure service can include (1) curtailing resource delivery to some or all virtual resource users and/or (2) pausing or suspending interruptible workloads. Ideally the infrastructure service provider has direct control of load management of at least some workload, meaning that they can pause or suspend workloads on-the-fly to proactively manage aggregate service rather than enduring the inevitable delays and uncertain if workload owners (i.e., application service provider organizations) are expected to execute workload adjustment actions themselves. This topic is considered further in Chapter 7: Lean Demand Management.
Online power generating capacity is factored into:
Figure 5.12 illustrates how these principles naturally map onto cloud infrastructure capacity:
The power industry defines curtailment as “a reduction in the scheduled capacity or energy delivery” (NERC, 2015). When customer demand temporarily outstrips the shared infrastructure's ability to deliver the promised throughput to all customers, then one or more customers must have their service curtailed, such as rate limiting their resource throughput. Technical implementation of curtailment can generally span a spectrum from total service curtailment of some active users to partial service curtailment for all active users. The service provider's operational policy determines exactly how that curtailment is implemented. For some industries and certain service infrastructures uniform partial service curtailment for all users might be appropriate, and for others more nuanced curtailment policies based on the grade of service purchased by the customer or other factors might drive actual curtailment actions.
Overselling, overbooking, or oversubscription is the sale of a volatile good or service in excess of actual supply. This is common practice in the travel and lodging business (e.g., overbooking seats on commercial flights or hotel rooms). The ICT industry routinely relies on statistical demand patterns to overbook resource capacity. For example, all N residents in a neighborhood may be offered 50 megabit broadband internet access that is multiplexed onto the internet service provider's access network with maximum engineered throughput of far less than N times 50 megabits. During periods of low neighborhood internet usage the best effort internet access service is able to deliver 50 megabit uploads and downloads. However if aggregate neighborhood demand exceeds the engineered throughput of the shared access infrastructure, then the service provider must curtail some or all subscribers internet throughput; that curtailment appears to subscribers as slower downloads and lower throughput.
A well-known service curtailment model in public cloud computing are spot VM instances offered by Amazon Web Services (AWS) that issues customers “termination notices” 2 minutes before a customer's VM instance is unilaterally terminated by AWS. Curtailment policies for cloud infrastructure resources are considered further in Chapter 7: Lean Demand Management.
Figure 5.13 visualizes the power balance objective that utilities must maintain: at any moment the sum of all power generation must meet all loads, losses, and scheduled net interchange. Committed (a.k.a., online) generating equipment routinely operates between some economic minimum and economic maximum power output; the actual power setting is controlled by:
Load-serving entities consider power generation over three time horizons:
Figure 5.14 visualizes the cloud infrastructure service provider's real-time balance problem analogous to the energy balance of Figure 5.13. Infrastructure service providers must schedule the commitment of their equipment hosting compute, memory, networking, and storage resources and place the virtual resources hosting application loads across those physical resources to balance between wasting resources (e.g., electricity by powering excess capacity) and curtailing virtual resources because insufficient capacity is configured online to serve applications' demand with acceptable service quality. Note that when one provider's resource demand outstrips supply, it may be possible to “burst” or overflow demand to another cloud data center or infrastructure service provider.
Infrastructure service providers have three fundamental controls to maintain virtual resource balance:
Several planning horizons also apply to infrastructure service provider operations:
The power industry explicitly recognizes the notion of a capacity emergency which can be defined as “a state when a system's or pool's operating capacity plus firm purchases from other systems, to the extent available or limited by transfer capability, is inadequate to meet the total of its demand, firm sales and regulating requirements” (PJM Manual 35). Capacity emergencies apply to cloud infrastructure just as they apply to electric power grids. Capacity emergency events will likely trigger emergency service recovery actions, such as activation of emergency (geographically distributed) reserve capacity (Section 8.6.2).