Lean Computing for the Cloud

The standard definition of cloud computing is a “paradigm for enabling network access to a scalable and elastic pool of sharable physical or virtual resources with self-service provisioning and administration on demand” (ISO/IEC 17788). This paradigm enables organizations to shift from traditional, capacity-driven operational models to lean, demand-driven operational models. This work adapts lean manufacturing principles across the cloud service delivery chain to achieve the lean cloud computing goal of sustainably achieving the shortest lead time, best quality, and highest customer delight at the lowest cost.

Traditionally, ICT systems were configured with sufficient capacity to serve the forecast demand, plus a safety margin, for the upcoming months, quarters, or perhaps even years. After configuring application and resource capacity to serve that forecast demand, further changes to the configuration were often discouraged to reduce the risk of procedural errors or expositing residual software or documentation defects which might impact production traffic. Thus, significant excess resource and application capacity were often committed that would rarely or never be used, thereby wasting both capital and operating expense. Lean computing pivots from the traditional build-for-peak, supply-oriented operating model to a just-in-time, demand-driven operating model. Lean cloud computing enables sustainable efficiency improvements that are essential when offering service into a competitive and cost-sensitive market.

This work considers lean cloud computing via three interlocking threads of analysis:

Methodically applying lean (i.e., Toyota production system) thinking to the cloud service delivery chain, especially regarding rapid elasticity and scaling. This is the focus of Chapter 3: Lean Thinking on Cloud Capacity Management, Chapter 4: Lean Cloud Capacity Management Strategy, Chapter 7: Lean Demand Management, Chapter 8: Lean Reserves, and Chapter 10: Lean Cloud Capacity Management Performance Indicators.
Applying insights from electric power generation and grid operations to cloud infrastructure operations. This is the focus of Chapter 5: Electric Power Generation as Cloud Infrastructure Analog and Chapter 9: Lean Infrastructure Commitment.
Applying insights from inventory management to cloud capacity management. This is the focus of Chapter 6: Application Capacity Management as an Inventory Management Problem.

This work considers business, architectural, and operational aspects of efficiently delivering valuable services to end users via cloud-based applications hosted on shared cloud infrastructure and focuses on overall optimization of the service delivery chain to enable both application service provider and infrastructure service provider organizations to adopt leaner, demand-driven operations to serve end users more efficiently. Explicitly considering the service delivery challenges of both the cloud service customer organizations that operate applications running on cloud infrastructure—as well as the challenges of cloud infrastructure service provider organizations that operate shared cloud resources—offers perspective and insight to enable optimizations across the entire service delivery chain to benefit cloud service providers, cloud service customers and cloud service users. The work is targeted at readers with business, operational, architectural, development, or quality backgrounds in the ICT industry to help them achieve the shortest lead time, best quality and value, and highest customer delight at the lowest cost for their service offerings. The work does not consider lean or agile development, software-defined networking (SDN), facility planning, or tradeoffs of any particular implementation technology (e.g., virtualization hypervisors versus Linux containers).

This book is structured as follows:

Basics (Chapter 1) – this chapter lays out the key concepts that underpin this analysis: cloud computing principles and roles; demand, supply, capacity, and fungibility; and differentiating demand management, capacity management, and performance management.
Rethinking Capacity Management (Chapter 2) – this chapter reviews traditional, ITIL, and eTOM capacity management models, and factors capacity management into two components for deeper consideration: (1) capacity decision and planning and (2) capacity action fulfillment. The chapter lays out the three fundamental cloud capacity management challenges:
1. Physical infrastructure capacity management – how much physical equipment should be deployed to each cloud data center?
2. Virtual resource capacity management – how much of that physical equipment should be powered on and made available to support application service providers at any point in time?
3. Application capacity management – how much application capacity should be online and available to service user demand at any point in time?
This chapter also frames the cloud computing service delivery chain that will be analyzed in Chapter 3: Lean Thinking on Cloud Capacity Management and in the remainder of the book.
Lean Thinking on Cloud Capacity Management (Chapter 3) – this chapter rigorously applies lean, Toyota production system thinking to cloud computing. The lean cloud computing goal of sustainably achieving the shortest lead time, best quality and value, and highest customer delight at the lowest cost is underpinned by foundational principles and supported by pillars of respect and continuous improvement. These lean notions are methodically applied in subsequent chapters.
Lean Cloud Capacity Management Strategy (Chapter 4) – this chapter methodically applies lean thinking on cloud capacity management of Chapter 3 to the fundamental cloud capacity management problems of Section 2.5: Three Cloud Capacity Management Problems.
Electric Power Generation as Cloud Infrastructure Analog (Chapter 5) – modern utilities provide electricity-as-a-service with business, technology, and operational models that are analogous to cloud infrastructure-as-a-service providers. This chapter highlights key best practices from the mature electric power generation business that are applicable to cloud infrastructure service providers.
Application Capacity Management as an Inventory Management Problem (Chapter 6) – online application capacity can usefully be imagined as an inventory of regenerative assets maintained by an application service provider to serve user demand. The rapid elasticity of cloud virtual infrastructure enables the application service provider to pivot from a traditional supply/capacity-driven operational model to a demand-driven operational model. This chapter highlights best practices from lean, just-in-time inventory management that are applicable to application service providers.
Lean Demand Management (Chapter 7) – this chapter considers how various demand management techniques, including several from the power industry discussed in Chapter 5: Electric Power Generation as Cloud Infrastructure Analog, can be applied to cloud to support the key lean principle of leveling the workload.
Lean Reserves (Chapter 8) – some safety, spare, or reserve capacity is necessary to assure acceptable service quality across random variations in patterns of demand, failures, and other unforeseen situations. This chapter considers the use and nature of reserve capacity in detail to enable deeper understanding of the appropriate level of reserve capacity.
Lean Infrastructure Commitment (Chapter 9) – this chapter applies the electric power industry's notion of unit commitment for optimally scheduling startup and shutdown of generating equipment from Chapter 5: Electric Power Generation as Cloud Infrastructure Analog to cloud minimize excess online infrastructure capacity (Section 3.3.3), waste heat (Section 3.3.14), and carbon footprint (Section 3.3.15).
Lean Cloud Capacity Management Performance Indicators (Chapter 10) – this chapter offers objective and quantitative performance measures for lean cloud capacity management which can be used to methodically drive continuous improvement of lean cloud computing deployments.
Summary (Chapter 11) – this chapter connects all of the analyses and threads of lean cloud computing considered in earlier chapters in a crisp summary.

Cross-references are included throughout the text to help readers follow analysis through to insights and recommendations. An Index, Abbreviations, and References are also included.