Chapter 30. Making Applications Cloud Ready

And if we are able thus to attack an inferior force with a superior one, our opponents will be in dire straits.

—Sun Tzu

In Chapter 29, Soaring in the Clouds, we covered the history, pros and cons, and typical uses of cloud computing. In this chapter, we continue our discussion of cloud computing but focus on ensuring your applications are ready for the cloud. Attempting to move a monolithic application into the cloud often results in extremely poor performance and a scramble to get back into a colocation or data center facility. This chapter outlines how to evaluate and prepare your application so that moving into the cloud will be a positive experience and not a disaster.

The Scale Cube in a Cloud

Unfortunately, because of misinformation and hype, many people believe that the cloud provides instant high availability and unlimited scalability for applications. Despite functionality such as auto-scaling, this is not the case. Technologists are still responsible for the scaling and availability of their applications. The cloud is just another technology or an architectural framework that can be used to achieve high availability and scalability; but one that cannot guarantee your application is available or scalable. Let’s look at how a cloud provider might attempt to provide each axis of the AKF Scale Cube and see what we need to be responsible for ourselves.

x-Axis

Recall from Chapters 22, 23, and 24 that the x-axis of the AKF Scale Cube represents cloning of services or data with absolutely no bias. If we have a service or platform consisting of N systems, each of the N systems can respond to any request and will give exactly the same answer as any other system. This is an x-axis scaling model. There is no bias to the service, customer, or data element. For a database, each x-axis implementation requires the complete cloning or duplication of an entire data set. We called this a horizontal duplication or cloning of data. The x-axis approach is fairly simple to implement in most cases.

In a cloud environment, we implement an x-axis split just as we would in a data center or colocation facility. That is, we replicate the code onto multiple instances or replicate the database onto other instances for read replicas. In a cloud environment, we can do this in several ways. We can launch multiple instances from a machine image and then deploy the identical code via a deployment script. Alternatively, we can build a brand-new machine image from an instance with the code already deployed. This will make the code part of the image. Deploying new images automatically deploys the code. A third way is to make use of a feature many clouds have, auto-scaling. On AWS you build an “Auto Scaling” group, set launch configurations, and establish scaling policies.

Each of these three methods of scaling via the x-axis in a cloud is straightforward and easy to implement. By comparison, y- and z-axis scaling is not so simple.

y- and z-Axes

The y-axis of the cube of scale represents a separation of work responsibility within your application. It is most frequently thought of in terms of functions, methods, or services within an application. The y-axis split addresses the monolithic nature of an application by separating that application into parallel or pipelined processing flows.

The z-axis of the cube is a split based on a value that is “looked up” or determined at the time of the transaction; most often, this split is based on the requestor of the transaction or the customer of your service. This is the most common implementations of the z-axis, but not the only possible implementation. We often see ecommerce companies split along the z-axis on SKUs or product lines. Recall that a y-axis split helps us scale by reducing instructions and data necessary to perform a service. A z-axis split attempts to do the same thing through non-service-oriented segmentation, thereby reducing the amount of data and transactions needed for any one customer segment.

In a cloud environment, these splits do not happen automatically. You as the architect, designer, or implementer must handle this work. Even if you use a NoSQL solution such as MongoDB that facilitates data partitioning or sharding across nodes, you still are required to design the system in terms of how the data is split and where the nodes exist. Should we shard by customer or by SKU? Should the nodes all reside in a single data center or across multiple data centers? In the AWS vernacular, this would be across availability zones (US-East-1a, US-East-1b) or across regions (i.e., US-East-1 [northern Virginia], US-West-1 [Northern California]). One factor that you should consider is how highly available the data needs to be. If the answer is “very,” then you might want to place replica sets across regions. This will obviously come with costs such as data transfer rates. but will also provide increased latency for data replication. Replica sets across availability zones such as US-East-1a and US-East-1b have much lower latency, as they are data centers within a metro area.

Just because you are working a cloud environment, it doesn’t mean that you automatically get high availability and scalability for your applications. You as the technologist are still responsible for designing and deploying your system in a manner that ensures it scales and is highly available.

Overcoming Challenges

For all the benefits that a cloud computing environment provides (as discussed in Chapter 29, Soaring in the Clouds), some challenges must still be considered and in some cases overcome to ensure your application performs well in the cloud. In this section, we address two of these challenges: availability and variability in input/output.

Fault Isolation in a Cloud

Just because your environment is a cloud, that does not mean it will always be available. There is no magic in a cloud—just some pretty cool virtualization technology that allows us to share compute, network, and storage resources with many other customers, thereby reducing the cost for all customers. Underneath the surface are still servers, disks, uninterruptible power supplies (UPS), load balancers, switches, routers, and data centers. One of our mantras that we use in our consulting practice is “Everything fails!” Whether it is a server, network gear, a database, or even an entire data center, eventually every resource will fail. If you don’t believe us about data centers failing, just do a Web search for “data center failures.” You’ll find reports such as Ponemon Institute’s 2013 survey, which showed that 95% of respondents experienced an unplanned data center outage.1 A cloud environment, whether it is a private cloud or a public cloud environment, is no different. It contains servers and network gear that fail, and occasionally entire data centers that fail, or in AWS’s case entire regions, which consist of multiple data centers in a metro area, that fail.

1. http://www.emersonnetworkpower.com/documentation/en-us/brands/liebert/documents/white%20papers/2013_emerson_data_center_cost_downtime_sl-24680.pdf.

This isn’t to say that AWS doesn’t have great availability or uptime; rather, the intent here is to point out that you should be aware that its services occasionally fail as well. In fact, Amazon offers a guaranteed monthly service level agreement (SLA) of 99.95% availability of its compute nodes (EC2) or it will provide service credits.2 However, if AWS has 99.95% availability, your service running on top cannot have anything higher. You cannot achieve (assuming AWS meets but doesn’t exceed its SLA) 99.99% availability even if your application has no downtime or outages. If your application has 99.9% availability and the hosting provider has 99.95% availability, then your total availability is likely to be 0.999 × 0.9995 = 0.9985 or 99.85%. The reason is that your application’s outages (43.2 minutes per month at 99.9%) will not always coincide with your hosting provider’s outages (21.6 minutes per month at 99.95%).

2. http://aws.amazon.com/ec2/sla/; accessed October 6, 2014.

Hosting your application or service in a single data center or even a single availability zone in AWS that encompasses multiple data centers is not enough to protect your application from outages. If you need high availability, you should not delegate this responsibility to your vendor. It’s your responsibility and you should embrace it, ensuring your application is designed, built, and deployed to be highly available.

Some of the difficulties of managing this feat within a cloud are the same that you would encounter in a traditional data center. Machine images, code packages, and data are not automatically replicated across regions or data centers. Your management and deployment tools must be developed so that they can move images, containers, code, and data. Of course, this endeavor doesn’t come for free from a network bandwidth perspective, either. Replicating your database between East Coast and West Coast data centers or between regions takes up bandwidth or adds costs in the form of transfer fees.

While Infrastructure as a Service (IaaS) providers, such as the typical cloud computing environment, are generally not in the business of replicating your code and data across data centers, many Platform as a Service (PaaS) providers do attempt to offer this service, albeit with varying levels of success. Some of our clients have found that the PaaS provider did not have enough capacity to dynamically move their applications during an outage. Again, we would reiterate that your application’s availability and scalability are your responsibility. You cannot pass this responsibility to a vendor and expect it to care as much as you do about your application’s availability.

Variability in Input/Output

Another major issue with cloud computing environments for typical SaaS applications or services is the variability in input/output from a storage perspective. Input/output (I/O) is the communication between compute devices (i.e., servers) and other information processing systems (i.e., another server or a storage subsystem). Inputs and outputs are the signals or data transferred in and out, respectively.

Two primary measurements are used when discussing I/O: input/output per second (IOPS) and megabytes per second (MBPS). IOPS measures how many I/O requests the disk I/O path can satisfy in a second. This value is generally in inverse proportion to the size of the I/O requests; that is, the larger the I/O requests, the lower the IOPS. This relationship should be fairly obvious, as it takes more time to process a 256KB I/O request than it does an 8KB request. MBPS measures how much data can be pumped through the disk I/O path. If you consider the I/O path to be a pipeline, MBPS measures how big the pipeline is and, therefore, how many megabytes of data can be pushed through it. For a given I/O path, MBPS is in direct proportion to the size of the I/O requests; that is, the larger the I/O requests, the higher the MBPS. Larger requests give you better throughput because they incur less disk seek time than smaller requests.

In a typical system, there are many I/O operations. When the memory requirements on the compute node exceed the allocated memory, paging ensues. Paging is a memory-management technique that allows a computer to store and retrieve data from secondary storage. With this approach, the operating system uses same-size blocks called “pages.” The secondary storage often comprises the disk drives. In a cloud environment, the attached storage devices (disk drives or SSD) are not directly attached storage but rather network storage where data must traverse the network from the compute node to the storage device. If the network isn’t highly utilized, the transfer can happen very quickly, as we’ve come to expect with storage. However, when the network becomes highly utilized (possibly saturated), I/O may slow down by orders of magnitude. When you hosted your application in your data center with directly attached storage (DAS), you might have seen 175–210 IOPS and 208 MBPS on 15K SAS drives.7 If these rates drop to 15 IOPS and 10 MBPS, could your application still respond to a user’s request in a timely manner? The answer is probably “no,” unless you could dynamically spin up more servers to distribute the requests across more application servers.

7. Symantec. “Getting the Hang of IOPS v1.3.” http://www.symantec.com/connect/articles/getting-hang-iops-v13.

“How could a cloud computing environment have a reduced I/O performance by an order of magnitude?” you ask. The answer relates to the “noisy neighbor” effect. In a shared infrastructure, the activity of a neighbor on the same network segment may affect your virtual machine’s performance if that neighbor uses more of the network than expected. This type of problem is referred to in economics theory as the “tragedy of the commons,” wherein individuals, acting independently and rationally according to their own self-interest, behave contrary to the whole group’s long-term best interests by depleting some common resource. Often the guilty party or noisy neighbor can’t help itself from consuming too much network bandwidth. Sometimes the neighbor is the victim of a distributed denial-of-service (DDOS) attack. Sometimes the neighbor has DDOS’d itself with a bug in its code. Because users can’t monitor the network in a cloud environment (this type of monitoring isn’t allowed to be done by customers using the service but it is obviously performed by the cloud provider), the guilty party often doesn’t know the significance of the impact that its seemingly localized problem is having on everyone else.

Since you cannot isolate or protect yourself from the noisy neighbor problem and you don’t know when it will occur, what can you do? You need to prepare your application to either (1) pay for provisioned or guaranteed IOPS; (2) handle significant I/O constraints by increasing compute capacity (auto-scaling via x-axis replication); or (3) redirect user requests to another zone, region, or data center that is not experiencing the degradation of I/O. The third approach also requires a level of monitoring and instrumentation to recognize when degradation is occurring and which zones or regions are processing I/O normally. Handling the variability in I/O by paying more is certainly one option—but just like buying larger compute nodes or servers to handle more user requests instead of continuing to buy commodity hardware, it’s a dangerous path to take. While it might make sense in the short term to buy your way out of a bind, in the long run this approach will lead to problems. Just as you can only buy so much compute capacity in a single server and it gets more expensive per core as you add more such capacity, this approach with a cloud vendor simply gets more expensive as your needs grow.

A far better approach than paying to eliminate the problem is to design your application and deploy it in such a manner that when (not “if”) the noisy neighbor acts up, you can expand your compute capacity to handle the problem. The auto-scaling we discussed earlier is a form of horizontal scaling along the x-axis. This method can also be applied to the database should I/O problems affect your database tier. Instead of more compute nodes, we would spin up more read replicas of our database to scale this problem out. If auto-scaling isn’t an option, perhaps because your application does many more writes than reads (this is almost never the case with SaaS software, as user requests are usually 80% to 90% reads and very few writes), then you might look to the third option of shifting the traffic to a different data center.

Although shifting user requests to a different data center can be done via a disaster recovery scenario with a complete failover, this “nuclear option” is usually difficult to recover from and technologists are generally reluctant to pull the trigger. If your application has a y- or z-axis split that is separated into different data centers or regions, shifting traffic is much less of a big deal. You already have user requests being processed in the other facility and you’re just adding some more.

Of course, you can use any combination of these three options—buying, scaling, or shifting—to ensure that you are prepared to handle these types of cloud environment problems. Whatever you decide, leaving the fate and availability of your service or application in the hands of a vendor is not a wise move. Let’s now look at a real-world example of how a company prepared its application for a cloud environment.

Intuit Case Study

In this chapter, we’ve focused on making an existing application or SaaS offering cloud ready. Sometimes, however, you have to start designing an application from the ground up to be capable of running in a cloud environment. The SaaS offering that we discuss in this section was built specifically to run in a cloud environment, with the developers taking into consideration the constraints and problems that come with that decision up front in the design and development phases.

Intuit’s Live Community makes it easy to find answers to tax-related questions by facilitating connections with tax experts, TurboTax employees, and other TurboTax users. You can access Live Community from any screen in Intuit’s TurboTax application to post a question and get advice from the community. One of Intuit’s Engineering Fellows, Felipe Cabrera, described the Live Community cloud environment as one in which “everything fails” and “[these failures] happen more often than in our gold-plated data centers.” With this understanding of the cloud environment, Intuit designed the application to run across multiple availability zones within Amazon Web Services (AWS). It also designed the solution with the capability to move between AWS regions for the purposes of disaster recovery. Elastic load balancers distribute users’ requests across machine instances in multiple availability zones.8 This fault isolation between zones helps with the availability issues that sometimes affect machine instances or entire availability zones.

8. For details of this service, see http://aws.amazon.com/about-aws/whats-new/2013/11/06/elastic-load-balancing-adds-cross-zone-load-balancing/.

To accommodate the variability in I/O or noisy neighbor problems, the team implemented several different architectural strategies. The first was an x-axis split of the database, implemented by creating a read replica of the database that the application can use for read (select) queries. The next strategy was designing and developing the application so that it could make use of AWS Auto Scaling,9 a Web service designed to launch or terminate Amazon EC2 instances automatically based on user-defined policies, schedules, and health checks. This allows for the application tier to scale via the x-axis as demand increases or decreases or as the application begins performing better or worse because of the variability in I/O.

9. For details of this service, see http://aws.amazon.com/documentation/autoscaling/.

To monitor the performance of the critical customer interactions, the team built an extensive testing infrastructure. This infrastructure is often modified and extended. It allows the team to monitor whether changes to the service have the expected impact on the performance of customer interactions.

While these designs helped the Live Community scale in a cloud environment, they didn’t come without hard-won lessons. One painful lesson dealt with the use of a read database slave for select queries. The library that the developers employed to enable this split in read and write queries did not fail gracefully. When the read database slave was not available, the entire application halted. The problem was quickly remedied, but the database split designed to improve scalability ended up costing some availability until the developers fixed the hard dependency issue. This incident also highlighted the lack of a mechanism to test the failure of an availability zone.

Another painful lesson was learned with auto-scaling. This AWS service allows EC2 (virtual machines) to be added or removed from the pool of available servers based on predefined criteria. Sometimes the criterion is CPU load; other times it might be a health check. The primary scale constraint for Intuit’s Live Community application was concurrent connections. Unfortunately, the team didn’t instrument the auto-scaling service to watch this metric. When the application servers ran out of connections, they stopped serving traffic, which caused a customer incident where the Live Community wasn’t available for a period of time. Once this was remedied (by adding a health check and by scaling based on concurrent connections), auto-scaling worked very well for Intuit’s application, which has a huge variability in seasonal demand because of the tax season (January to April 15) each year.

As you can see from this example, even a great company like Intuit, which purposefully designs its application to be cloud ready, can run into problems along the way. What enabled the company to be so successful with the Live Community10 service in the cloud is that it had knowledge about the challenges of a cloud environment, spent time up front to design around these challenges, and reacted quickly when it encountered issues that it could not anticipate.

10. The Live Community team has evolved over time. Four people critical to the service have been Todd Goodyear, Jimmy Armitage, Vinu Somayaji, and Bradley Feeley.

Conclusion

In this chapter, we covered the AKF Scale Cube in a cloud environment, discussed the two major problems for applications being moved from a data center hosting environment into a cloud environment, and looked at a real-world example of a company that prepared its application for the cloud environment.

We can accomplish x-axis splits in the cloud in several ways—launch multiple instances from a machine image and then deploy the identical code, build a brand-new machine image from an instance with the code already deployed, or make use of auto-scaling, a feature found in many cloud environments. However, in a cloud environment, y- and z-axis splits do not happen automatically. You as the architect, designer, or implementer must handle this work. Even if you use a technology that facilitates data partitioning or sharding across nodes, you are still required to design the system in terms of how the data is split and where the nodes exist.

The two main challenges for SaaS companies seeking to move their applications from a data center hosting environment to a cloud environment are availability and variability in I/O. Everything fails, so we need to be prepared when a cloud environment fails. There are numerous examples of cloud providers that have experienced outages of entire data centers or even multiple data centers. If you need high availability, you should not delegate this responsibility to your vendor. It’s your responsibility and you should embrace it, ensuring your application is designed, built, and deployed to be highly available.

Creating fault-isolated, highly available services in the cloud poses the same challenges that you would find in a traditional data center. Machine images, code packages, and data are not replicated automatically. Your management and deployment tools must be developed such that they can move images, containers, code, and data.

Another major issue with cloud computing environments for typical SaaS applications or services is the variability in input/output from a storage perspective. Because you can’t predict when the variability will occur, you need to prepare your application to either (1) pay for provisioned or guaranteed IOPS, (2) handle significant I/O constraints by increasing compute capacity (auto-scaling via the x-axis replication), or (3) redirect user requests to another zone, region, data center, or other resource that is not experiencing the degradation of I/O.

Building for scale means testing for scale and availability. Testing failover across regions in a cloud is a must. Figuring out how to test slow response time is also an important but often overlooked step.

Key Points

• In a cloud environment, we implement an x-axis split just as we would in a data center or colocation facility. We can do this several ways:

Image Launch multiple instances from a machine image and then deploy the identical code

Image Build a brand-new machine image from an instance with the code already deployed

Image Use auto-scaling, a feature that many clouds have

• In a cloud environment, the y- and z-axis splits do not happen automatically. You as the architect, designer, or implementer must handle this work.

• Availability of compute nodes, storage, network, and even data centers can be a problem in cloud environments. While we have to deal with these issues in a data center environment, we have more control and monitoring of them in a data center or colocation.

• The variability of input/output as measured by IOPS and MBPS can be a significant issue with getting applications ready to transition into the cloud.

• You can prepare an application for running in the cloud with the variability of I/O by doing the following:

Image Paying for provisioned or guaranteed IOPS

Image Handling significant I/O constraints by increasing compute capacity (auto-scaling via the x-axis replication)

Image Redirecting user requests to another zone, region, data center, or other resource that is not experiencing the degradation of I/O