Chapter 6. Scaling Out

THE WEB IS THE WORLD’S LARGEST ONLINE INFORMATION SYSTEM, scaling to billions of devices and users. Hypermedia documents connect a near limitless number of resources, most of which are designed to be read, not modified. At a grand scale, structural hypermedia rules, helped by the safe, idempotent properties of the ubiquitous GET method, and to a lesser extent some of the other cacheable verbs.

From a programmatic web perspective, the infrastructure that has evolved on the Web—particularly around information retrieval—solves many integration challenges. In this chapter, we look at how we can use that infrastructure and some associated patterns to build scalable, fault-tolerant enterprise applications.

According to the HTTP specification, GET is used to retrieve the representation of a resource. Example 6-1 shows a consumer retrieving a representation of an order resource from a Restbucks service by sending an HTTP GET request to the server where the resource is located.

The value of the Host header plus the relative path that follows GET together give the complete URI of the resource being requested—in this case, http://restbucks.com/order/1234. In HTTP 1.1, servers must also support absolute URIs, in which case the Host header is not necessary, as shown in Example 6-2.

The response to either of these two requests is shown in Example 6-3.

As well as the payload, the response includes some headers, which help consumers and any intermediaries on the network process the response. Importantly, we can use some of these headers to control the caching behavior of the order representation.

As we discussed in Chapter 3, GET is both safe and idempotent. We use GET simply to retrieve a resource’s state representation, rather than deliberately modify that state.

Because GET has no impact on resource state, it is possible to optimize the network to take advantage of its safe and idempotent characteristics. If we see a GET request, we immediately understand that the requestor doesn’t want to modify anything. For these requests, it makes sense to store responses closer to consumers, where they can be reused to satisfy subsequent requests. This optimization is baked into the Web through caching.

Caching is the ability to store copies of frequently accessed data in several places along the request-response path. When a consumer requests a resource representation, the request goes through a cache or a series of caches toward the service hosting the resource. If any of the caches along the request path has a fresh copy of the requested representation, it uses that copy to satisfy the request. If none of the caches can satisfy the request, the request travels all the way to the service (or origin server as it is formally known).

Origin servers control the caching behavior of the representations they issue. Using HTTP headers, an origin server indicates whether a response can be cached, and if so, by whom, and for how long. Caches along the response path can take a copy of a response, but only if the caching metadata allows them to do so. The caches can then use these copies to satisfy subsequent requests. Cached copies of a resource representation can be used to satisfy subsequent requests so long as they remain fresh. A cached representation remains fresh for a specific period of time, which is called its freshness lifetime. When the age of a cached object exceeds its freshness lifetime, the object is said to be stale. Caches will often add an Age response header to a cached response. The Age header indicates how many seconds have passed since the representation was generated at the origin server.

A stale representation must be revalidated with the origin server before it can be used to satisfy any further requests. If the revalidation reveals that the stale representation is in fact still valid, the cached copy can be reused. If, however, the resource has changed since the stale representation was first issued, the cached copy must be invalidated and replaced. Representations can become invalid during their freshness lifetime without the cache knowing. Unless the consumer specifically asks for a revalidation or a new copy from the origin server, the cache will continue to use these invalid (but fresh) representations until they become stale.

One of the Web’s key architectural tenets is that servers and services should not preserve application state. The statelessness constraint helps make distributed applications fault-tolerant and horizontally scalable. But it also has its downsides. First, because application state is not persisted on the server, consumers and services must exchange application state information with each request and response, which adds to the size of the message and the bandwidth consumed by the interaction. Second, because the constraint requires services to forget about clients between requests, it prevents the use of the classical publish-subscribe pattern (which requires the service to retain subscriber lists). To receive notifications, consumers must instead frequently poll services to determine whether a resource has changed, adding to the load on the server.

Caching helps mitigate the consequences of applying the statelessness constraint.[62] It reduces the amount of data sent over the network by storing representations closer to where they are needed, and it reduces the load on origin servers by having caches satisfy repeated requests for the same data.

We’ve discussed several of the benefits of caching. But there are at least four situations in which we might not want to cache data:

A whole ecosystem of proxy servers has grown up around GET and its safe and idempotent semantics. Proxy servers are common intermediaries between consumers and origin servers, which we recognize from our human use of the Web. While they can perform various operations on HTTP requests and responses, such as information filtering and security checks, they are most commonly used for caching.

Many of us are familiar with application caches and database caches—two types of caches that can reside behind service boundaries. Nowadays, many systems also explicitly route requests through distributed in-memory caches. But the kinds of caches we’re talking about here are those that are already part of the installed infrastructure of the Web:

Figure 6-1 shows the many places in which these caches appear on the Web.

Caches can be arranged in complex topologies. They can be clustered to improve reliability or arranged in hierarchies. Caches in a cache hierarchy forward requests for which they do not have a cached representation to other caches farther up the hierarchy, until a cached representation is found or the request is finally passed to the origin server.

Given that caches are designed around the retrieval of resource representations, it shouldn’t come as a surprise to learn that they mostly (but not exclusively) work with GET requests. Responses to GET requests are cacheable by default.[66] Responses to POST requests are not cacheable by default, but can be made cacheable if either an Expires header, or a Cache-Control header with a directive that explicitly allows caching, is added to the response. Responses to PUT and DELETE requests are not cacheable at all.

The more a service supports GET and the appropriate caching headers, the more the Web’s infrastructure can help with scalability. Imagine a situation in which a very inquisitive Restbucks customer repeatedly asks a barista for the status of his coffee. If the barista spends a lot of her time answering questions, her overall output will diminish. Given that the answer stays the same for relatively long periods (e.g., “I’m preparing your medium skim-milk latte”), a lot of effort is wasted for very little benefit. Deploying a cache between the consumer and the Restbucks barista frees the barista from having to answer the same question over and over again. As a result, the overall coffee output of the Restbucks service improves.

There are two main HTTP response headers that we can use to control caching behavior:

If we can determine an absolute expiry time for a cached response, we should use an Expires header. If it’s more appropriate to indicate how long the response can be considered fresh once it has left the origin server, we should use a Cache-Control header, adding a max-age or s-maxage directive to specify a relative Time to Live (TTL).

Cacheable responses (whether to a GET or to a POST request) should also include a validator—either an ETag or a Last-Modified header:

ETag

In Chapter 4, we said that an ETag value is an opaque string token that a server associates with a resource to uniquely identify the state of the resource over its lifetime. When the resource changes, the entity tag changes accordingly. Though we used ETag values for concurrency control in Chapter 4, they are just as useful for validating the freshness of cached representations.

Last-Modified

Whereas a response’s Date header indicates when the response was generated, the Last-Modified header indicates when the associated resource last changed. The Last-Modified value cannot be later than the Date value.

Example 6-4 shows a response containing Expires, ETag, and Last-Modified headers.

This response can be cached and will remain fresh until the date and time specified in the Expires header. To revalidate a response, a cache uses the ETag header value or the Last-Modified header value to do a conditional GET.[67] If a consumer wants to revalidate a response, it should include a Cache-Control: no-cache directive in its request. This ensures that the conditional request travels all the way to the origin server, rather than being satisfied by an intermediary.

Example 6-5 shows a response containing a Last-Modified header, an ETag header, and a Cache-Control header with a max-age directive.

This response is cacheable and will remain fresh for up to one hour. As with the previous example, a cache can revalidate the representation using either the Last-Modified value or the ETag value.

Cache-Control directives serve three functions when used in a response. Some make normally uncacheable responses cacheable. Others make normally cacheable responses uncacheable. Finally, there are some Cache-Control directives that do not affect the cacheability of a response at all; rather, they determine the freshness of an already cacheable response. An individual directive can serve one or more of these functions.

max-age=<delta-seconds>

This directive controls both cacheability and freshness. It makes a response capable of being cached by local and shared caches (proxies and reverse proxies), as well as specifying a freshness lifetime in seconds. A max-age value overrides any Expiry value supplied in a response.

s-maxage=<delta-seconds>

Like max-age, this directive serves two functions: it makes responses cacheable, but only by shared caches, and it specifies a freshness lifetime in seconds.

public

This directive makes a response capable of being cached by local and shared caches, but doesn’t determine a freshness value. Importantly, public takes precedence over authorization headers. Normally, if a request includes an Authorization header, the response cannot be cached. If, however, the response includes a public directive, it can be cached. You should exercise care, however, when making responses that require authorization cacheable.

private

This directive makes a response capable of being cached by local caches only (i.e., within the consumer implementation). At the same time, it prevents normally cacheable responses from being cached by shared caches. private doesn’t determine a freshness value.

must-revalidate

This directive makes normally uncacheable responses cacheable, but requires caches to revalidate stale responses with the origin server. Only if the stale response is successfully validated with the origin server can the cached content be used to satisfy the request. must-revalidate is enormously useful in balancing consistency with reduced bandwidth and computing resource consumption. While it forces a revalidation request to travel all the way to the origin server, an efficient validation mechanism on the server side will prevent the core service logic from being invoked for a large percentage of requests—all for the cost of a measly 304 Not Modified response.

proxy-revalidate

This directive is similar to must-revalidate, but it only applies to shared caches.

no-cache

This directive requires caches to revalidate a cached response with the origin server with every request. If the request is successfully validated with the origin server, the cached content can be used to satisfy the request. The directive only works for responses that have been made cacheable using another header or directive (i.e., it doesn’t make uncacheable responses cacheable). Unfortunately, different caches behave in different ways with regard to no-cache: some caches treat no-cache as an instruction to not cache a response (as per an old draft of HTTP 1.1); some treat it correctly, as a requirement to always revalidate a cached response.

no-store

This directive makes normally cacheable content uncacheable by all caches.

The HTTP Stale Controls Informational RFC recently added two new directives to this list, which together enable us to make trade-offs between latency, availability, and consistency.[68] These directives are:

Squid 2.7 currently supports these last two directives; support is forthcoming in later versions of Squid and Apache Traffic Server.

The directives we’ve looked at so far can be mixed in interesting and useful ways, as the following examples demonstrate. Example 6-6 shows how we can make a representation cacheable by local caches for up to one hour.

Example 6-7 is more interesting, in that it allows caching of representations that require authorization.

public makes the response cacheable by both local and shared caches, while max-age=0 requires a cache to revalidate a cached representation with the origin server (using a conditional GET request) before releasing it. (Ideally, we’d use no-cache, but because some caches treat no-cache as an instruction to not cache at all, we’ve opted for max-age=0 instead.) This combination is useful when we want to authorize each request, but still take advantage of the bandwidth savings offered by the caching infrastructure, as we see in Figure 6-2.

In revalidating each request with the server, the cache will pass on the contents of the Authorization header supplied by the consumer. If the origin server replies 401 Unauthorized, the cache will refuse to release the cached representation. The combination public, max-age=0 differs from must-revalidate in that it allows caching of responses to requests that contain Authorization headers.

Let’s see how Restbucks can take advantage of caching to improve the distribution of its menu. The Restbucks menu is an XML document that is consumed by third-party applications such as coffee shop price comparators and customers. The menu resource is dynamically created from the Restbucks product database. Every time the menu service receives a GET request for the menu, it must perform some logic and database access.

Restbucks would like to ensure that its menu service isn’t overwhelmed by thousands of requests from external services. But instead of deploying more servers or paying for more bandwidth, Restbucks decides to make use of the Web’s caching infrastructure.

This caching infrastructure includes reverse proxies and proxy caches, as well as local caches. Some consumers of Restbucks’ menu service may opt to use their local caches to speed up their systems, knowing that consistency with Restbucks’ data isn’t always guaranteed. Doing so is easy: Example 6-8 shows some simple .NET HTTP client code that uses the WinINet cache provided by Microsoft Windows Internet Services. The WinINet cache is the same local cache that Internet Explorer uses, and so has a large installed base.

To take advantage of local caching, we need only add a RequestCachePolicy instance to our request. The policy is initialized with a RequestCacheLevel.Default enum value, which ensures that the local cache is used to try to satisfy the request. If the local cache can’t satisfy the request, the request will be forwarded to the origin server (or to any intervening shared caches).

On the server side, the menu service is implemented using an instance of the .NET Framework’s HttpListener class.[69] Example 6-9 shows the code that creates and starts the listener.

When it receives a request, the listener calls its GetMenu(…) method, passing it an HttpListenerContext object, which encapsulates the request and response context. Each request is handled on a separate thread taken from the .NET thread pool. The implementation of GetMenu(…) is shown in Example 6-10.

First, GetMenu(…) sets the ContentType of the response to application/xml. Then it gets an XDocument representation of the menu from a repository and writes it to the response stream.

This implementation produces the response shown in Example 6-11. Every time a consumer attempts to GET the Restbucks menu, the request is handled by this code. In other words, every request consumes processor time. This is because there are no caching headers in the response that would allow any web proxies on the request path to cache the response and directly serve it in the future.

According to the HTTP specification, a web proxy can cache a 200 OK response even if the response doesn’t include any specific caching metadata.[70] Still, it’d be helpful if the service explicitly stated whether a response can be cached. Doing so helps to ensure that the caching infrastructure is used to its full potential. Example 6-12 shows how we can change the implementation of GetMenu(…) to include some caching metadata.

As Example 6-12 shows, we only need to add a single line in order to make the response cacheable. Because we don’t expect the menu to change more than once per week, we inform caches that they can consider the response fresh for up to 604,800 seconds. We also indicate that the representation is public, meaning both local and shared caches can cache it. As a result of this change, the response now contains a Cache-Control header, as shown in Example 6-13.

That one line of code has the potential to dramatically reduce Restbucks’ infrastructure and operational costs. Our menu’s representation now gets distributed at various caches around the Web, as Figure 6-3 illustrates.

Caching doesn’t just work for public-facing services. Using these same web caching techniques, we can also improve the scalability and fault-tolerance characteristics of services we deploy within the boundaries of an organization. If we write our applications with caching in mind, and expose most of our business logic through domain application protocols using GET and caching headers, we can offload much of the processing and bandwidth load to caches without any special coding or middleware.

Because the Web is loosely coupled, weak consistency is a feature of all web-based distributed applications. As a result of the statelessness constraint, a service has no way of notifying consumers when a resource changes. In consequence, consumers sometimes act on stale data. In an attempt to keep up-to-date, many consumers will repeatedly GET (poll) a resource representation to discover whether it has changed recently. But this strategy is only as good as the polling frequency. In general, we must assume that a consumer’s understanding of the state of a resource lags the service’s view of the same resource.

Caching only exacerbates the situation. The moment we introduce caching, we should assume that consumers will become inconsistent with services, and just deal with it. While there are several techniques for increasing the degree of consistency among consumers, caches, and services, the fact remains that different web actors will often have different copies of a resource representation.

The three techniques for improving consistency are:

Expiration raises a couple of issues. On the one hand, a long TTL increases the likelihood that at some point a cached representation will no longer reflect the current state of the resource on the origin server, even though it is still fresh in the cache. That is, fresh representations can become invalid. Consumers of such cached representations must be able to tolerate a degree of latency between a resource changing and its being updated in a cache. On the other hand, representations that have become stale in the cache, but whose resources haven’t changed since they were issued, will in fact prove to be still valid; revalidating such representations, though necessary, is suboptimal in terms of network and server resource usage.

Expiration and validation can be used separately or in combination. With a pure validation-based approach (using, for example, a no-cache directive), consumers and caches revalidate with every request, thereby ensuring that they always have an up-to-date version of a representation. With this strategy, we must assess the trade-offs between increased consistency and the resultant rise in bandwidth and load on the server.

In contrast, an exclusively expiration-based approach reduces bandwidth usage and the load on the origin server, but at the risk of there being newer versions of a resource on the server while older (but still fresh) representations are being served from caches. After a cached representation has expired, a subsequent GET will result in a full representation being returned along the response path, even if the version hasn’t in fact changed on the server.

By using expiration and validation together, we get the best of both worlds. Cached representations are used while they remain fresh. When they become stale, the cache or consumer revalidates the representation with the origin server. This approach helps reduce bandwidth usage and server load. There’s still the possibility, however, that representations that remain fresh in a cache become inconsistent with resource state on the origin server.

A cache can determine whether a resource has changed by revalidating a cached representation with the origin server. In Chapter 4, we used ETag values with If-Match and If-None-Match headers (and Last-Modified values with If-Unmodified-Since and If-Modified-Since headers) to do conditional updates and deletes. Validation is accomplished using the same headers and values, but with conditional GETs.

A conditional GET tries to conserve bandwidth by sending and receiving just HTTP headers rather than headers and entity bodies. A conditional GET only exchanges entity bodies when a cached resource representation is out of date. In simple terms, the conditional GET pattern says to a server: “Give me a new resource representation only if the resource has changed substantially since the last time I asked for it. Otherwise, just give me the headers I need to keep my copy up-to-date.”

Conditional GETs are useful only when the client making the request has previously fetched and held a copy of a resource representation (and the attached ETag or Last-Modified value). To revalidate a representation, a consumer or cache uses a previously received ETag value with an If-None-Match header, or a previously supplied Last-Modified value with an If-Modified-Since header. If the resource hasn’t changed (its ETag or Last-Modified value is the same as the one supplied by the consumer), the service replies with 304 Not Modified (plus any ETag or Location headers that would normally have been included in a 200 OK response). If the resource has changed, the service sends back a full representation with a 200 OK status code.

When a service replies with 304 Not Modified, it can also include Expires, Cache-Control, and Vary headers. Caches can update their cached representation with any new values in these headers. A 304 Not Modified response should also include any ETag or Location headers that would ordinarily have been sent in a 200 OK response; including these headers ensures that as well as the cached resource state, the consumer’s cached metadata is also kept up-to-date.

Example 6-14 shows two request-response interactions: a GET, which returns an entity body, and then a revalidation, which uses the Last-Modified value from the first response with an If-Modified-Since header. The revalidation says “execute this request only if the entity has changed since the Last-Modified time supplied in this request.”

Example 6-15 shows a similar pair of interactions, but this time the revalidation uses an If-None-Match header with an ETag value. This revalidation says “execute this request only if the ETag belonging to the entity is different from the ETag value supplied in the request.”

When developing services, we have to decide the best way of calculating entity tags on a case-by-case basis. Consumers, however, should always treat ETags as opaque string tokens—they don’t care how they’re generated, so long as the tag discriminates between changed representations.

There are two things to consider when implementing ETags in a service: computation and storage. If an entity tag value can be computed on the fly in a relatively cheap manner, there’s very little point in storing the value with the resource—we can just compute it with each request. If, however, generating an entity tag value is a relatively expensive operation, it’s worth persisting the computed value with the resource.

Computationally cheap ETag values can be generated using quoted string versions of timestamps, as we discussed in Chapter 4. This is generally a “good enough” solution for entities that don’t change very often. When a consumer includes an entity tag value generated using this method in a conditional request, evaluating the conditional request is often as simple as comparing the supplied value against a file or database row timestamp. For collections, we can use the timestamp of the most recently updated member of the collection. Using a timestamp in an ETag header rather than—as is more usual—a Last-Modified header allows us to evolve the service validation strategy without requiring corresponding changes in consumers. For example, we might choose to use a timestamp-based strategy in an early version of a service because the business context ensures that resources change relatively infrequently. If the business process later evolves such that resources change twice or more during a single second, we can safely change the service-side entity tag generation and validation strategy without consumers having to evolve in lockstep. If we’d initially used a Last-Modified header, consumers would have to switch to using ETags.

The most expensive ETag values tend to be those that are computed using a hash of a representation. Hashes can be computed from just the entity body, or they can include headers and header values as well. If hashing headers, avoid including header values containing machine identity. This is to avoid problems when scaling out, where many machines serve identical representations. If a representation’s ETag value encodes something host-specific, caches will end up with multiple copies of a representation differing only by origin server. Similarly, if we generate ETag values based on a hash of the headers as well as the entity body, we should avoid using the Expires, Cache-Control, and Vary headers, which can sometimes be used to update a cached representation after revalidating with the origin server.

As an optimization, we might consider caching precomputed entity tag values in an in-memory structure. Once again, load-balanced, multimachine scenarios introduce additional complexity here, but if computing the value on the fly is especially expensive, or accessing a persisted, precomputed value becomes a bottleneck in the system, a distributed, in-memory cache of ETag values might just be the thing we need to help save precious computing resources.

Most of the work of implementing conditional GET takes place in the service code, which has to look for the If-Modified-Since and If-None-Match headers, evaluate their conditions, and construct a 200 OK or 304 Not Modified response as appropriate. Services that don’t set caching headers, or that incorrectly handle conditional GETs, can have a detrimental effect on the behavior of the system.

The bandwidth, latency, and scalability benefits of using conditional GET should be clear by now, but at what expense? It might seem as though to realize these benefits we have quite a bit of work to do on the client and the server—storing entity tags, adding If-Modified-Since and If-None-Match headers to requests, and updating store representations with response header values.

But guess what? Most caches handle this behavior for us for free. Consumer applications don’t need to take any notice of ETag and Last-Modified values: validations are dealt with by the underlying caching infrastructure.

We’ve already seen how a service can control the expiration of a representation using the Expires header and certain Cache-Control directives in a response. Consumers, too, can influence cache behavior. By sending Cache-Control directives in requests, consumers can express their preference for representations that fall within particular freshness bounds, or their tolerance for stale representations.

These Cache-Control directives allow a consumer to make trade-off decisions around consistency and latency. Consider, for example, an application that has been optimized for latency (by making the majority of representations cacheable). Consumers that require a higher degree of consistency can use max-age or min-stale to obtain representations with stricter freshness bounds, but at the expense of the cache revalidating with the origin server more often than dictated by the server. In contrast, consumers that care more about latency than consistency can choose to relax freshness constraints, and accept stale representations from nearby caches, using max-stale or only-if-cached.

Another situation where these request directives are useful is after a failed conditional PUT or POST. If a conditional operation fails, it’s normal for the consumer to GET the current state of the resource before retrying the operation. In these circumstances, it is advisable to use a Cache-Control: no-cache directive with the request, to force an end-to-end reload that returns the current state of the resource on the server, rather than a still fresh but now invalid representation from a cache.

There are two types of invalidation: consumer-driven invalidation and server-driven invalidation. Server-driven invalidation falls outside HTTP’s capabilities, whereas a form of consumer-driven invalidation is intrinsic to HTTP.

Let’s look at consumer-driven invalidation first. According to the HTTP specification, DELETE, PUT, and POST requests should invalidate any cached representations belonging to the request URI. In addition, if the response contains a Location or Content-Location header, representations associated with either of these header values should also be invalidated.

At first glance, it would appear that a consumer could invalidate a cached representation using a DELETE, PUT, or POST request, and thereafter be confident that this same representation won’t be returned in subsequent GETs. But we must remember that this technique can only guarantee to invalidate caches on the immediate request path. Caches that are not on the request path will not necessarily be invalidated. Once again, we are reminded of the need to deal with the weak consistency issues inherent in the Web’s architecture.

In contrast to the necessarily weak consistency model of consumer-driven invalidation, server-driven invalidation would appear to offer stronger consistency guarantees. With server-driven invalidation, the service sends invalidation notices to the caches and consumers it knows are likely to have a cached representation of a particular resource. With this approach, all interested parties—both those on and those off the request-response path—will be invalidated when a resource changes.

But this is not the way the Web usually works. For such an approach to be successful, a service would have to maintain a list of consumers and caches to be contacted when a resource changes. In other words, the service would have to maintain application state. And holding application state on the server undermines scalability.

Server-driven invalidation only works for caches the server knows about. Moreover, its strong consistency guarantees only hold while the caches that need to be notified of an invalidation event are connected to the service. If a network problem disconnects a cache, causing it to miss one or more invalidation messages, the overall distributed application will be in an inconsistent state—at least temporarily.

It should be clear by now that server-driven invalidation can only partly mitigate the weak consistency issues that come with adopting the Web as an integration platform. Because of the generally web-unfriendly nature of server-driven invalidation, expiration and validation are by far the most common methods of ensuring eventual consistency between services and consumers on the Web.

Once we have determined that a resource’s representations can be cached, we will have to decide which caches to target, together with the freshness lifetimes of the cacheable representations.

When deciding on the freshness lifetime of a representation, we must balance server control with scalability concerns. With short expiration values, the service retains a relatively high degree of control over the representations it releases, but this control comes at the expense of frequent reloads and revalidations, both of which use network resources and place load on the origin server. Longer expiration values, on the other hand, conserve bandwidth and reduce the number of requests that reach the origin server; at the same time, however, they increase the likelihood that a cached representation will become inconsistent with resource state on the server over the course of its freshness lifetime.

Being able to invalidate cached representations would help here; we could specify a long freshness lifetime for each representation, but then invalidate cached entries the moment a resource changes. Unfortunately, the Web doesn’t support a general invalidation mechanism.

There is, however, one way we can work with the Web to make representations as cacheable as possible, but no more. Instead of seeking to invalidate entries, we can extend their freshness lifetime.

Cache channels implement a technique for extending the freshness lifetimes of cached representations.[71] Caches that don’t understand the cache channel protocol will continue to expire representations the moment they become stale. Caches that do understand the protocol, however, are entitled to treat a normally stale representation as still fresh, until they hear otherwise.

Cache channels use two new Cache-Control extensions. Caches that understand these directives can use cache channels to extend the freshness lifetime of cached representations. These extensions are:

Example 6-16 shows a request for a product from Restbucks’ product catalog. The response includes a Cache-Control header containing both cache channel extensions.

The max-age directive specifies that this representation will remain fresh for up to an hour, after which it must be revalidated with the origin server. But any cache on the response path that understands the channel and group extensions can continue to extend the freshness lifetime of this representation as long as two conditions hold:

If a cache performs a GET on the channel specified in the channel extension, it receives the cache channel feed shown in Example 6-17.

This is an Atom feed that has been generated by the origin server. In the next chapter, we discuss Atom feeds in detail and use them to build event-driven systems. For now, all we need to understand is that this is an empty feed—it doesn’t contain any channel events. (Cache channels don’t have to be implemented as Atom feeds, but given that there’s widespread support for Atom on almost all development platforms it’s easy to build cache channels using the Atom format.) The feed’s <cc:precision> element specifies a precision in seconds, meaning that caches that subscribe to this feed must poll it at least as often as every 15 minutes if they want to extend the freshness lifetimes of any representations associated with this channel. The <cc:lifetime> element value indicates that events in this feed will be available for at least a day after they have been issued.

Note

As you can see from Example 6-17, the Atom feed can itself be cached. As we’ll learn in the next chapter, event feeds can take advantage of the Web’s caching infrastructure as much as any other representation.

As long as the cache continues to poll the channel at least every 15 minutes, it can continue to serve the cached product representation well beyond its original freshness lifetime of an hour. If the resource does change on the origin server, the very next time the cache polls the channel it will receive a response similar to the one shown in Example 6-18.

The feed now contains a stale event entry whose alternate <link> element associates it with the group ID to which the product representation belongs (urn:uuid:1f80b2a1-660a-4874-92c4-45732e03087b). Each event has its own ID, which has nothing to do with the identifiers of any cached representations; it’s the <link> element’s href value that associates the event with a group or particular representation.

Seeing this event, the cache stops extending the freshness lifetime of any representations belonging to this group. The next time a consumer issues a request for http://restbucks.com/product-catalog/1234, the cache will revalidate its stale representation with the origin server.

Cache channels work with the Web because they don’t require origin servers to maintain application state in the form of lists of connected caches. Each cache is responsible for guaranteeing the delivery of stale events by polling the cache channel. If a cache can’t connect to the channel, it can no longer continue to extend the freshness lifetime of otherwise stale representations.

By associating cached representations with groups, cache channels provide a powerful mechanism for canceling the extended freshness of several related representations at the same time. This is particularly useful when we decompose an application protocol into several overlapping resources that together manipulate the state of an underlying domain entity. POSTing a completion to http://restbucks.com/orders/1234/completion, for example, may render any fresh representations of http://restbucks.com/orders/1234 invalid. This is the kind of consistency issue traditional invalidation mechanisms seek to address and precisely the kind of challenge the loosely coupled nature of the Web makes difficult to solve. Using small freshness lifetimes together with cache channels, we can reduce the time it takes for the overall distributed application to reach a consistent state.

Of course, cache channels only work for caches that know how to take advantage of the channel and group extensions. Though the HTTP Cache Channels Internet-Draft has now expired, several reverse proxies, including Squid and Varnish, include support for its freshness extension mechanism.[72] But in an environment where not all cache implementations can be controlled by service implementers, the same difficult truth emerges once again: the Web is weakly consistent.

In this chapter, we saw how the safe and idempotent properties of the most popular verb on the Web, GET, are key to building fault-tolerant and scalable systems. The installed infrastructure of the Web includes a caching substrate that we can use to bring frequently accessed representations closer to consumers, thereby reducing latency, conserving bandwidth, masking transient faults, and decreasing the load on services. Services dictate the caching behaviors of the representations they issue; consumers tighten or relax the expectations they have of caches as they see fit.

We also considered the implications of the Web’s weak consistency model. No matter the expiration or validation mechanisms we choose to employ, we must always remember that we cannot guarantee that a representation of resource state as received by a consumer reflects the current state of the resource as held by the service.

In the last section, we looked at how cache channels allow us to extend the freshness lifetimes of cached representations. Our cache channels example used an Atom feed to communicate “stale” events to caches that understand the cache channels protocol.

In the next chapter, we look in more detail at the Atom feed format. Knowing about GET and the caching opportunities offered by the Web, we show how to put these pieces together to create a scalable, fault-tolerant, event-driven system.



[62] Benjamin Carlyle discusses this topic in more detail here: http://soundadvice.id.au/blog/2010/01/17/.

[66] The response should really have either an expiry time, or a validator, as we discuss shortly.

[67] Choosing between ETag values and Last-Modified timestamps depends on the granularity of updates to the resource. Last-Modified is only as accurate as a timestamp (to the nearest second), while ETags can be generated at any frequency. Typically, however, timestamps are cheaper to generate.

[69] For this example, we host the HttpListener instance in a console application. For production, we’d host it in IIS to take advantage of management and fault-tolerance features.

[70] Additional responses that can be cached in this fashion include 203 Non-Authoritative Information, 206 Partial Content, 300 Multiple Choices, 301 Moved Permanently, and 410 Gone.

[71] Cache channels are the brainchild of Mark Nottingham; see http://www.mnot.net/blog/2008/01/04/cache_channels.