Chapter 8. SQS: Simple Queue Service

Amazon’s Simple Queue Service (SQS) provides reliable storage and delivery of messages between any clients or computers with access to the Internet. It allows message senders and recipients to interact without having to communicate directly with each other and without requiring that either side be always available or connected to the network.

SQS combines the key advantages of a conventional messaging architecture—loosely-coupled and fault-tolerant communication—with a reliable and flexible distributed infrastructure that stores messages redundantly over multiple data centers. Because SQS is accessible to clients on any platform that can send and receive HTTP requests, the service makes it possible to build distributed applications with truly heterogeneous components using a range of platforms and development languages.

Note

As this book went to press, Amazon Web Services released a new version of the SQS service API: 2008-01-01. This new API includes an updated pricing model that is intended to make the service cheaper for most users, however it also includes significant changes that are not compatible with previous API versions or the third-party libraries and tools we use in this book.

The previous APIs will remain available until May 6, 2009, after when all SQS users must migrate to the 2008-01-01 version. In this book we describe the older 2007-05-01 API and we will not discuss the new API in depth. For more information about the benefits of the new API and the main differences between this and previous versions, refer to API Version 2008-01-01.”

The SQS is composed of two main resources that can be acted on through the application programming interface (API): messages and queues.

SQS is implemented as a distributed system in which copies of each message are stored in multiple physical servers and potentially across multiple data centers. This strategy provides benefits in terms of redundancy, reliability, and scalability; it also results in some drawbacks that would not apply to a more centralized messaging system. These drawbacks are not serious and can be avoided, provided you bear the following points in mind when designing your SQS-based applications.

Message retrievals may return incomplete results

When a message receiver asks SQS to return the messages available in a queue, the system only samples a subset of its physical servers for messages that belong in the queue. Only the messages stored on the sampled servers are returned to the receiver, though there may be other messages stored on servers that were not sampled. This server sampling technique is represented in Figure 8-1.

The service uses what Amazon calls a weighted random distribution algorithm to determine which servers to sample in each retrieval request. If the first retrieval request does find all the available messages, subsequent retrievals will eventually return all the messages from all the servers. However, you cannot assume that the result of any particular retrieval request contains the total number of messages available in the queue.

You are most likely to receive a limited subset of messages when there are than several hundred messages in a queue. If you query a queue that contains relatively few messages, you may receive no messages at all in some results, even though there are messages available in the queue.

Messages may not be delivered quickly

You cannot rely on SQS in scenarios where messages must be delivered immediately or very quickly. The delivery of SQS messages is delayed by at least the amount of time it takes for receivers to poll the service for new messages. In some cases, messages will be delayed further if they are stored on a server that has become temporarily unavailable, or which is not in the subset sampled by a retrieval request. As a general guide, you can expect messages to take from 2 to 10 seconds to be delivered by SQS under normal circumstances.

Messages may be delivered out of order

SQS cannot guarantee that messages will be delivered in the same order they were sent. Although the service will attempt to keep your messages in order, this is not always possible in a distributed system. If your application requires messages to be processed in a particular order, you will have to build order-checking functionality into your application to use SQS safely.

Messages may be redelivered

SQS decides whether or not to deliver a particular message based on two criteria: whether it still exists in the system and its visibility state, a property we will describe shortly. Because it is impossible to guarantee that information about a message’s state or life-cycle status will always be synchronized between all the servers in the distributed SQS system, your application must gracefully handle the redelivery of messages that ought to be invisible or deleted.

Here are the most important guidelines that SQS application developers should follow to take advantage of the service’s strengths and avoid the drawbacks of its distributed architecture:

Most importantly, try to forget any preconceptions you may have about how messaging systems work. SQS has different capabilities from most other messaging systems, and it does not aim to provide a solution for timely, ordered, and once-only delivery of messages between application components. If you need a messaging system that provides these features, you will either have to implement them yourself by adding a layer of business logic on top of SQS, or you will need to use an alternative system that already does all this work for you.

One situation in which SQS works particularly well is when it is used to deliver work items to the components of a distributed application. For example, a director component might send messages with task instructions to a pool of worker components that receive and process these messages as they arrive. In this scenario, the order and time frame of message delivery is not as important as ensuring that all messages are eventually delivered at least once.

SQS is designed to allow multiple clients to receive messages from a single queue. The service aims to deliver each message only once, though it will deliver a message multiple times if this is necessary to ensure that the message is properly processed and acknowledged. This approach means that messages are not lost, even if a message-receiving component crashes or loses network connectivity before it has finished processing a message.

To manage the delivery of messages, the service maintains state information about each message that indicates whether or not it should be delivered to potential receivers. This state information is called the message’s visibility. A message may be visible or invisible. While a message is invisible, it remains in the queue but will not be delivered to message receivers until it becomes visible again. The state of a message is changed from visible to invisible each time the message is received by a client; this prevents the message from being received by another client straightaway. The change to the invisible state is only temporary, and after a set amount of time SQS will make the message visible again. Figure 8-2 shows the main events in a message’s life cycle.

The time interval for which a message will remain invisible is called its visibility timeout. The visibility timeout of a message is managed automatically by SQS queues, or it may be modified directly by API operations on the service.

The visibility timeout of a message is measured in seconds, and it can be a value from zero—in which case the message is in the visible state—up to 86,400 seconds, which means the message will remain invisible for a full day (86,400 seconds equals 24 hours). The duration of the timeout may be set on a per-queue or per-message basis. A queue’s default visibility timeout setting determines how long a message remains invisible when it is delivered, though this value can be overridden with a message-specific timeout at any point in the message’s lifetime. Because a message only remains invisible for a limited time, the only way to prevent the message from being eventually redelivered is to delete it from the queue.

Note

The content of a message can be viewed at any time using a peek operation, even when the message is invisible and cannot be retrieved with a standard receive request.

To make the most efficient use of SQS messaging in an application, it is vital to apply the appropriate visibility timeout values to your messages. If the timeout is too short, a message could be redelivered before the original recipient has had enough time to process and delete it, resulting in unnecessary and wasteful reprocessing. If the timeout is too long, the redelivery and processing of messages will be delayed unnecessarily when a component that has already received some messages fails or loses connectivity. Ideally, a message’s visibility timeout setting should match the time it will take to process that message and delete it from the queue.

To understand how a distributed application may be built around SQS messaging, we should look at the different roles that may be played by SQS clients. A client may perform one or more of the following tasks: send messages, receive messages, or manage and monitor SQS resources.

Message sender

A message sender contacts SQS, asks it to create a new message in a specific queue, and provides the data that will make up the content of the message. Once the service has acknowledged the receipt of the message, the sender can be certain that the message will be delivered at least once to a message receiver watching that queue.

When a message is sent, the sender is provided with an ID string that uniquely identifies the message in the target queue. This ID can be used to perform operations on the message, such as viewing its contents, changing its visibility state, or deleting the message altogether.

Message receiver

SQS is available via a web service interface that requires clients to initiate a connection to the service to perform actions or receive information. This means that clients of SQS must actively contact the service to receive messages; there is no mechanism for notifying message receivers when new messages become available. Message receivers must poll the service at intervals to receive new messages.

A message receiver client contacts the service and asks it to provide one or more messages from a specific queue. If there are messages in the queue that are visible and are stored on one of the SQS servers sampled by the operation, these messages are returned to the receiver. If there are no messages available, the receiver will generally wait for some amount of time before contacting the service again.

When there are messages available, the receiver can obtain the data content of the message from the service’s response and process this information. The receiver also obtains the identifier for each message it receives, and it can use this identifier to perform a follow-up operation on the message. Here are the actions a message receiver may perform on a message, depending whether the message was successfully processed.

Administrator

An administrator client performs the management tasks necessary to keep the messaging infrastructure running smoothly. These tasks may include creating new queues, defining the default visibility timeout settings for a queue, and configuring a queue’s access control settings.

In addition to one-off management tasks like these, an administrator may undertake monitoring and maintenance tasks. For example, an administrator may monitor the number of messages stored in a queue to determine whether the messages are being processed quickly enough to keep up with demand.

In February of 2008, Amazon Web Services released a redesigned version of the SQS API that provides for lower usage costs. This new API, version 2008-01-01, offers an updated pricing model that will make the service cheaper for most users. However, it also includes significant changes that are incompatible with the service’s previous APIs and any libraries or tools based on them. Between February 6, 2008 and May 6, 2009, SQS developers can use either the previous API versions or the newest API. After May 6 2009, only the 2008-01-01 API version will remain available.

Because the new API was released late in this book’s production process, our discussion of SQS will be limited to the superseded API version 2007-05-01, for which third-party libraries and tools were available at the time of writing. In this section we will briefly describe the new API and how it differs from previous versions. Readers who wish to take advantage of the new pricing model, or who are updating their applications in preparation for the mandatory switch-over on May 6 2009, should bear these differences in mind when reading Chapters 8 and 9.

SQS account holders are billed differently depending on whether they use the 2008-01-01 API version or previous versions. The pricing schedule for previous API versions is described in Pricing“ above. The API version 2008-01-01 incurs fees based on the number of requests performed and the amount of data transfered into and out of Amazon’s network.

Requests

SQS incurs a fee of 0.0001¢ per request (1¢ for 10,000 requests).

This per-request fee replaces the per-message fee imposed by the older APIs. The request fee is only one hundredth the value of the prior message fee, however the request fee is charged for every API operation including message retrieval requests. To take advantage of the new pricing model you should avoid polling for messages more often than is strictly necessary to minimize the number of requests you perform.

Data Transferred

The data transfer rates for the new API are identical to the rates for previous versions, however when you use the 2008-01-01 API there are no fees for data transferred between the EC2 and SQS services.

Note

These prices are correct as of February 2008; refer to Amazon’s web site to confirm the latest pricing.

In order to lower the fees it charges for the SQS service, Amazon made changes to the service’s API to reduce its internal running costs. A number of service features and API operations are deprecated in the new version, the structure and content of the service’s input and output messages have changed, and some data and time limits have been tightened. These changes are not compatible with previous versions of the API. Libraries, tools and applications that were designed to work with the previous APIs will not work with the 2008-01-01 version. These changes may require you to design and manage your SQS application differently from the examples presented in this book.

Table 8-1 describes the limits that have changed between API versions 2007-05-01 and 2008-01-01, and the bulleted list below lists some of the other major changes applied in the new version. For a complete list of changes, refer to the article “Migrating to Amazon SQS API Version 2008-01-01” and to the latest API documentation available on Amazon’s web site.