Always-on architectures

For many years, architects have always had two primary concerns: the availability of a given system and the recoverability of the system (often referred to as disaster recovery). These two concepts exist to address inherent qualities of a system deployed on a limited, on-premise infrastructure. In this on-premise infrastructure, there are a finite number of physical or virtual resources performing very specific functions or supporting a specific application. These applications are built in such a way that it negates the ability to run in a distributed manner across multiple machines. This paradigm means that the overall system has many single points of failure, whether it be a single network interface, a virtual machine or physical server, a virtual disk or volume, and so on.

Given these inherent fault points, architects developed two principle assessments to gauge the efficacy of a system. The systems' ability to remain running and perform its function is known as availability. If a system does fail, the recoverability of a system is gauged by two measurements:

When considering a completely cloud-native architecture, several important factors affect these old paradigms and allow us to evolve them:

Given these cloud features, we believe the new paradigm for cloud architectures allows us to achieve an always-on paradigm. This paradigm helps us plan for outages and architect in such a manner that the system can self-heal and course-correct without any user intervention. This level of automation represents a high level of maturity for a given system, and is the furthest along the Cloud Native Maturity Model.

It is important to note that every human endeavor will eventually fail, stumble, or become interrupted, and the cloud is no exception. Since we lack precognition and are constantly evolving our capabilities in IT, it is inevitable that something will break at some given point. Understanding and accepting this fact is at the heart of the always-on paradigm—planning for these failures is the only guaranteed way to mitigate or avoid them.