Web Operations by Allspaw, John -- Read -- Imperial Library of Trantor

Index

Web Operations: Keeping the Data on Time

SPECIAL OFFER: Upgrade this ebook with O’Reilly Foreword Preface

How This Book Is Organized Who This Book Is For Conventions Used in This Book Using Code Examples How to Contact Us Safari® Books Online Acknowledgments

1. Web Operations: The Career

Why Does Web Operations Have It Tough?

A Strong Background in Computing Practiced Decisiveness A Calm Disposition

From Apprentice to Master

Knowledge Tools Experience

The organizational challenge of inexperience The concept of "senior operations"

Discipline

Conclusion

2. How Picnik Uses Cloud Computing: Lessons Learned

Where the Cloud Fits (and Why!)

Storage Hybrid Computing with EC2

Where the Cloud Doesn't Fit (for Picnik) Conclusion

3. Infrastructure and Application Metrics

Time Resolution and Retention Concerns Locality of Metrics Collection and Storage Layers of Metrics

High-Level Business or Feature-Specific Metrics System- and Service-Level Metrics

Providing Context for Anomaly Detection and Alerts Log Lines Are Metrics, Too Correlation with Change Management and Incident Timelines Making Metrics Available to Your Alerting Mechanisms Using Metrics to Guide Load-Feedback Mechanisms A Metrics Collection System, Illustrated: Ganglia

Background A Quick Introduction to Ganglia

Conclusion

4. Continuous Deployment

Small Batches Mean Faster Feedback Small Batches Mean Problems Are Instantly Localized Small Batches Reduce Risk Small Batches Reduce Overhead The Quality Defenders' Lament

Why Does It Work?

Getting Started

Step 1: Continuous Integration Server Step 2: Source Control Commit Check Step 3: Simple Deployment Script Step 4: Real-Time Alerting Step 5: Root-Cause Analysis (Five Whys)

Continuous Deployment Is for Mission-Critical Applications

Another Release? Do I Have To? The QA Dilemma

Conclusion

5. Infrastructure As Code

Service-Oriented Architecture

Configuration Management

System Integration

The bootstrapping service. The configuration service.

Step 2: Integrate the services together

Conclusion

6. Monitoring

Story: "The Start of a Journey" Step 1: Understand What You Are Monitoring Step 2: Understand Normal Behavior Step 3: Be Prepared and Learn Conclusion

7. How Complex Systems Fail

How Complex Systems Fail

As It Pertains Specifically to Web Operations

It will be difficult to tell that the system has failed It will be difficult to tell what has failed Meaningful response will be delayed Communications will be strained and tempers will flare Maintenance will be a major source of new failures Recovery from backup is itself difficult and potentially dangerous Create test procedures that front-line people can use to verify system status Manage operations on a daily basis Control maintenance Assess performance at regular intervals Be a (unique) customer