Appendix E - Maximizing Scalability, Availability, and Performance


A key feature of the Windows Azure™ technology platform is the robustness that the platform provides. A typical Windows Azure solution is implemented as a collection of one or more roles, where each role is optimized for performing a specific category of tasks. For example, a web role is primarily useful for implementing the web front-end that provides the user interface of an application, while a worker role typically executes the underlying business logic such as performing any data processing required, interacting with a database, orchestrating requests to and from other services, and so on. If a role fails, Windows Azure can transparently start a new instance and the application can resume.

However, no matter how robust an application is, it must also perform and respond quickly. Windows Azure supports highly scalable services through the ability to dynamically start and stop instances of an application, enabling a Windows Azure solution to handle an influx of requests at peak times, while scaling back as the demand lowers, reducing the resources consumed and the associated costs.

NotePoe says:
Poe If you are building a commercial system, you may have a contractual obligation to provide a certain level of performance to your customers. This obligation might be specified in a service level agreement (SLA) that guarantees the response time or throughput. In this environment, it is critical that you understand the architecture of your application, the resources that it utilizes, and the tools that Windows Azure provides for building and maintaining an efficient system.

However, scalability is not the only issue that affects performance and response times. If an application running in the cloud accesses resources and databases held in your on-premises servers, bear in mind that these items are no longer directly available over your local high-speed network. Instead the application must retrieve this data across the Internet with its lower bandwidth, higher latency, and inherent unpredictably concerning reliability and throughput. This can result in increased response times for users running your applications or reduced throughput for your services.

Of course, if your application or service is now running remotely from your organization, it will also be running remotely from your users. This might not seem like much of an issue if you are building a public-facing website or service because the users would have been remote prior to you moving functionality to the cloud, but this change may impact the performance for users inside your organization who were previously accessing your solution over a local area network. Additionally, the location of an application or service can affect its perceived availability if the path from the user traverses network elements that are heavily congested, and network connectivity times out as a result. Finally, in the event of a catastrophic regional outage of the Internet or a failure at the datacenter hosting your applications and services, your users will be unable to connect.

This appendix considers issues associated with maintaining performance, reducing application response times, and ensuring that users can always access your application when you relocate functionality to the cloud. It describes solutions and good practice for addressing these concerns by using Windows Azure technologies.

Requirements and Challenges

The primary causes of extended response times and poor availability in a distributed environment are lack of resources for running applications, and network latency. Scaling can help to ensure that sufficient resources are available, but no matter how much effort you put into tuning and refining your applications, users will perceive that your system has poor performance if these applications cannot receive requests or send responses in a timely manner because the network is slow. A crucial task, therefore, is to organize your solution to minimize this network latency by making optimal use of the available bandwidth and utilizing resources as close as possible to the code and users that need them.

The following sections identify some common requirements concerning scalability, availability, and performance, summarizing many of the challenges you will face when you implement solutions to meet these requirements.

Managing Elasticity in the Cloud

Description: Your system must support a varying workload in a cost-effective manner.

Many commercial systems must support a workload that can vary considerably over time. For much of the time the load may be steady, with a regular volume of requests of a predictable nature. However, there may be occasions when the load dramatically and quickly increases. These peaks may arise at expected times; for example, an accounting system may receive a large number of requests as the end of each month approaches when users generate their month-end reports, and it may experience periods of increased usage towards the end of the financial year. In other types of application the load may surge unexpectedly; for example, requests to a news service may flood in if some dramatic event occurs.

The cloud is a highly scalable environment, and you can start new instances of a service to meet demand as the volume of requests increases. However, the more instances of a service you run, the more resources they occupy; and the costs associated with running your system rise accordingly. Therefore it makes economic sense to scale back the number of service instances and resources as demand for your system decreases.

How can you achieve this? One solution is to monitor the solution and start up more service instances as the number of requests arriving in a given period of time exceeds a specified threshold value. If the load increases further, you can define additional thresholds and start yet more instances. If the volume of requests later falls below these threshold values you can terminate the extra instances. In inactive periods, it might only be necessary to have a minimal number of service instances. However, there are a couple of challenges with this solution:

NoteBharath says:
Bharath Remember that starting and stopping service instances is not an instantaneous operation. It may take 10-15 minutes for Windows Azure to perform these tasks, so any performance measurements should include a predictive element based on trends over time, and initiate new service instances so that they are ready when required.

Reducing Network Latency for Accessing Cloud Applications

Description: Users should be connected to the closest available instance of your application running in the cloud to minimize network latency and reduce response times.

A cloud application may be hosted in a datacenter in one part of the world, while a user connecting to the application may be located in another, perhaps on a different continent. The distance between users and the applications and services they access can have a significant bearing on the response time of the system. You should adopt a strategy that minimizes this distance and reduces the associated network latency for users accessing your system.

If your users are geographically dispersed, you could consider replicating your cloud applications and hosting them in datacenters that are similarly dispersed. Users could then connect to the closest available instance of the application. The question that you need to address in this scenario is how do you direct a user to the most local instance of an application?

Maximizing Availability for Cloud Applications

Description: Users should always be able to connect to the application running in the cloud.

How do you ensure that your application is always running in the cloud and that users can connect to it? Replicating the application across datacenters may be part of the solution, but consider the following issues:

Optimizing the Response Time and Throughput for Cloud Applications

Description: The response time for services running in the cloud should be as low as possible, and the throughput should be maximized.

Windows Azure is a highly scalable platform that offers high performance for applications. However, available computing power alone does not guarantee that an application will be responsive. An application that is designed to function in a serial manner will not make best use of this platform and may spend a significant period blocked waiting for slower, dependent operations to complete. The solution is to perform these operations asynchronously, and this approach has been described throughout this guide.

Aside from the design and implementation of the application logic, the key factor that governs the response time and throughput of a service is the speed with which it can access the resources it needs. Some or all of these resources might be located remotely in other datacenters or on-premises servers. Operations that access remote resources may require a connection across the Internet. To mitigate the effects of network latency and unpredictability, you can cache these resources locally to the service, but this approach leads to two obvious questions:

Caching is also a useful strategy for reducing contention to shared resources and can improve the response time for an application even if the resources that it utilizes are local. However, the issues associated with caching remain the same; specifically, if a local resource is modified the cached data is now out of date.

NoteBharath says:
Bharath The cloud is not a magic remedy for speeding up applications that are not designed with performance and scalability in mind.

Windows Azure and Related Technologies

Windows Azure provides a number of technologies that can help you to address the challenges presented by each of the requirements in this appendix:

The following sections describe the Enterprise Library Autoscaling Application Block, Windows Azure Traffic Manager, and Windows Azure Caching, and provide guidance on how to use them in a number of scenarios.

Managing Elasticity in the Cloud by Using the Microsoft Enterprise Library Autoscaling Application Block

It is possible to implement a custom solution that manages the number of deployed instances of the web and worker roles your application uses. However, this is far from a simple task and so it makes sense to consider using a prebuilt library that is sufficiently flexible and configurable to meet your requirements.

NotePoe says:
Poe External services that can manage autoscaling do exist but you must provide these services with your management certificate so that they can access the role instances, which may not be an acceptable approach for your organization.

The Enterprise Library Autoscaling Application Block (also known as "Wasabi") provides such a solution. It is part of the Microsoft Enterprise Library 5.0 Integration Pack for Windows Azure, and can automatically scale your Windows Azure application or service based on rules that you define specifically for that application or service. You can use these rules to help your application or service maintain its throughput in response to changes in its workload, while at the same time minimize and control hosting costs.

Scaling operations typically alter the number of role instances in your application, but the block also enables you to use other scaling actions such as throttling certain functionality within your application. This means that there are opportunities to achieve very subtle control of behavior based on a range of predefined and dynamically discovered conditions. The Autoscaling Application Block enables you to specify the following types of rules:

Rules are defined in XML format and can be stored in Windows Azure blob storage, in a file, or in a custom store that you create.

By applying a combination of these rules you can ensure that your application or service will meet demand and load requirements, even during the busiest periods, to conform to SLAs, minimize response times, and ensure availability while still minimizing operating costs.

How the Autoscaling Application Block Manages Role Instances

The Autoscaling Application Block can monitor key performance indicators in your application roles and automatically deploy or remove instances. For example, Figure 1 shows how the number of instances of a role may change over time within the boundaries defined for the minimum and maximum number of instances.

Figure 1 - Data visualization of the scale boundaries and scale actions for a role

Figure 1
Data visualization of the scale boundaries and scale actions for a role

The behavior shown in Figure 1 was the result of the following configuration of the Autoscaling Application Block:

NotePoe says:
Poe By specifying the appropriate set of rules for the Autoscaling Application Block you can configure automatic scaling of the number of instances of the roles in your application to meet known demand peaks and to respond automatically to dynamic changes in load and demand.

Constraint Rules

Constraint rules are used to proactively scale your application for the expected demand, and at the same time constrain the possible instance count, so that reactive rules do not change the instance count outside of that boundary. There is a comprehensive set of options for specifying the range of times for a constraint rule, including fixed periods and fixed durations, daily, weekly, monthly, and yearly recurrence, and relative recurring events such as the last Friday of each month.

Reactive Rules

Reactive rules specify the conditions and actions that change the number of deployed role instances or the behavior of the application. Each rule consists of one or more operands that define how the block matches the data from monitoring points with values you specify, and one or more actions that the block will execute when the operands match the monitored values.

Operands that define the data points for monitoring activity of a role can use any of the Windows® operating system performance counters, the length of a Windows Azure storage queue, and other built-in metrics. Alternatively you can create a custom operand that is specific to your own requirements, such as the number of unprocessed orders in your application.

The Autoscaling Application Block reads performance information collected by the Windows Azure diagnostics mechanism from Windows Azure storage. Windows Azure does not populate this with data from the Windows Azure diagnostics monitor by default; you must run code in your role when it starts or execute scripts while the application is running to configure the Windows Azure diagnostics to collect the required information and then start the diagnostics monitor.

Reactive rule conditions can use a wide range of comparison functions between operands to define the trigger for the related actions to occur. These functions include the typical greater than, greater than or equal, less than, less than or equal, and equal tests. You can also negate the tests using the not function, and build complex conditional expressions using AND and OR logical combinations.

Actions

The Autoscaling Application Block provides the following types of actions:

NotePoe says:
Poe You can use the Autoscaling Application Block to force your application to change its behavior automatically to meet changes in load and demand. The block can change the settings in the service configuration file, and the application can react to this to reduce its demand on the underlying infrastructure.

The Autoscaling Application Block logs events that relate to scaling actions and can send notification emails in response to the scaling of a role, or instead of scaling the role, if required. You can also configure several aspects of the way that the block works such as the scheduler that controls the monitoring and scaling activates, and the stabilizer that enforces "cool down" delays between actions to prevent repeated oscillation and optimize instance counts around the hourly boundary.


NoteNote:
For more information, see "Microsoft Enterprise Library 5.0 Integration Pack for Windows Azure" on MSDN.

Guidelines for Using the Autoscaling Application Block

The following guidelines will help you understand how you can obtain the most benefit from using the Autoscaling Application Block:

Reducing Network Latency for Accessing Cloud Applications with Windows Azure Traffic Manager

Windows Azure Traffic Manager is a Windows Azure service that enables you to set up request routing and load balancing based on predefined policies and configurable rules. It provides a mechanism for routing requests to multiple deployments of your Windows Azure-hosted applications and services, irrespective of the datacenter location. The applications or services could be deployed in one or more datacenters.

Windows Azure Traffic Manager monitors the availability and network latency of each application you configure in a policy, on any HTTP or HTTPS port. If it detects that an application is offline it will not route any requests to it. However, it continues to monitor the application at 30 second intervals and will start to route requests to it, based on the configured load balancing policy, if it again becomes available.

Windows Azure Traffic Manager does not mark an application as offline until it has failed to respond three times in succession. This means that the total time between a failure and that application being marked as offline is three times the monitoring interval you specify.

How Windows Azure Traffic Manager Routes Requests

Windows Azure Traffic Manager is effectively a DNS resolver. When you use Windows Azure Traffic Manager, web browsers and services accessing your application will perform a DNS query to Windows Azure Traffic Manager to resolve the IP address of the endpoint to which they will connect, just as they would when connecting to any other website or resource.

NoteBharath says:
Bharath Windows Azure Traffic Manager does not perform HTTP redirection or use any other browser-based redirection technique because this would not work with other types of requests, such as from smart clients accessing web services exposed by your application. Instead, it acts as a DNS resolver that the client queries to obtain the IP address of the appropriate application endpoint. Windows Azure Traffic Manager returns the IP address of the deployed application that best satisfies the configured policy and rules.

Windows Azure Traffic Manager uses the requested URL to identify the policy to apply, and returns an IP address resulting from evaluating the rules and configuration settings for that policy. The user's web browser or the requesting service then connects to that IP address, effectively routing them based on the policy you select and the rules you define.

This means that you can offer users a single URL that is aliased to the address of your Windows Azure Traffic Manager policy. For example, you could use a CNAME record to map the URL you want to expose to users of your application, such as http://store.treyresearch.net, in your own or your ISPs DNS to the entry point and policy of your Windows Azure Traffic Manager policy. If you have named your Windows Azure Traffic Manager namespace as treyresearch and have a policy for the Orders application named ordersapp, you would map the URL in your DNS to http://ordersapp.treyresearch.trafficmanager.net. All DNS queries for store.treyresearch.net will be passed to Windows Azure Traffic Manager, which will perform the required routing by returning the IP address of the appropriate deployed application. Figure 2 illustrates this scenario.

Figure 2 - How Windows Azure Traffic Manager performs routing and redirection

Figure 2
How Windows Azure Traffic Manager performs routing and redirection

The default time-to-live (TTL) value for the DNS responses that Windows Azure Traffic Manager will return to clients is 300 seconds (five minutes). When this interval expires, any requests made by a client application may need to be resolved again, and the new address that results can be used to connect to the service. For testing purposes you may want to reduce this value, but you should use the default or longer in a production scenario.

Remember that there may be intermediate DNS servers between clients and Windows Azure Traffic Manager that are likely to cache the DNS record for the time you specify. However, client applications and web browsers often cache the DNS entries they obtain, and so will not be redirected to a different application deployment until their cached entries expire.

Using Monitoring Endpoints

When you configure a policy in Windows Azure Traffic Manager you specify the port and relative path and name for the endpoint that Windows Azure Traffic Manager will access to test if the application is responding. By default this is port 80 and "/" so that Windows Azure Traffic Manager tests the root path of the application. As long as it receives an HTTP "200 OK" response within ten seconds, Windows Azure Traffic Manager will assume that the hosted service is online.

You can specify a different value for the relative path and name of the monitoring endpoint if required. For example, if you have a page that performs a test of all functions in the application you can specify this as the monitoring endpoint. Hosted applications and services can be included in more than one policy in Windows Azure Traffic Manager, so it is a good idea to have a consistent name and location for the monitoring endpoints in all your applications and services so that the relative path and name is the same and can be used in any policy.

NoteMarkus says:
Markus If you implement special monitoring pages in your applications, ensure that they can always respond within ten seconds so that Windows Azure Traffic Manager does not mark them as being offline. Also consider the impact on the overall operation of the application of the processes you execute in the monitoring page.

If Windows Azure Traffic Manager detects that every service defined for a policy is offline, it will act as though they were all online, and continue to hand out IP addresses based on the type of policy you specify. This ensures that clients will still receive an IP address in response to a DNS query, even if the service is unreachable.

Windows Azure Traffic Manager Policies

At the time of writing Windows Azure Traffic Manager offers the following three routing and load balancing policies, though more may be added in the future:

To minimize network latency and maximize performance you will typically use the Performance policy to redirect all requests from all users to the application in the closest data center. The following sections describe the Performance policy. The other policies are described in the section "Maximizing Availability for Cloud Applications with Windows Azure Traffic Manager" later in this appendix.

Guidelines for Using Windows Azure Traffic Manager

The following list contains general guidelines for using Windows Azure Traffic Manager:

Guidelines for Using Windows Azure Traffic Manager to Reduce Network Latency

The following list contains guidelines for using Windows Azure Traffic Manager to reduce network latency:

If all of the hosted applications or services in a Performance policy are offline or unavailable (or availability cannot be tested due to a network or other failure), Windows Azure Traffic Manager will act as though all were online and route requests based on its internal measurements of global network latency based on the location of the client making the request. This means that clients will be able to access the application if it actually is online, or as soon as it comes back online, without the delay while Windows Azure Traffic Manager detects this and starts redirecting users based on measured latency.

Limitations of Using Windows Azure Traffic Manager

The following list identifies some of the limitations you should be aware of when using Windows Azure Traffic Manager:

Maximizing Availability for Cloud Applications with Windows Azure Traffic Manager

Windows Azure Traffic Manager provides two policies that you can use to maximize availability of your applications. You can use the Round Robin policy to distribute requests to all application deployments that are currently responding to requests (applications that have not failed). Alternatively, you can use the Failover policy to ensure that a backup deployment of the application will receive requests should the primary one fail. These two policies provide opportunities for two very different approaches to maximizing availability:

Guidelines for Using Windows Azure Traffic Manager to Maximize Availability

The following list contains guidelines for using Windows Azure Traffic Manager to maximize availability. Also see the sections "Guidelines for Using Windows Azure Traffic Manager" and "Limitations of Using Windows Azure Traffic Manager" earlier in this appendix.

If all of the hosted applications or services in a Round Robin policy are offline or unavailable (or availability cannot be tested due to a network or other failure), Windows Azure Traffic Manager will act as though all were online and will continue to route requests to each configured application in turn. If all of the applications in a Failover policy are offline or unavailable, Windows Azure Traffic Manager will act as though the first one in the configured list is online and will route all requests to this one.

NoteNote:
For more information about Windows Azure Traffic Manager, see "Windows Azure Traffic Manager."

Optimizing the Response Time and Throughput for Cloud Applications by Using Windows Azure Caching

Windows Azure Caching service provides a scalable, reliable mechanism that enables you to retain frequently used data physically close to your applications and services. Windows Azure Caching runs in the cloud, and you can cache data in the same datacenter that hosts your code. If you deploy services to more than one datacenter, you should create a separate cache in each datacenter, and each service should access only the co-located cache. In this way, you can reduce the overhead associated with repeatedly accessing remote data, eliminate the network latency associated with remote data access, and improve the response times for applications referencing this data.

However, caching does not come without cost. Caching data means creating one or more copies of that data, and as soon as you make these copies you have concerns about what happens if you modify this data. Any updates have to be replicated across all copies, but it can take time for these updates to ripple through the system. This is especially true on the Internet where you also have to consider the possibility of network errors causing updates to fail to propagate quickly. So, although caching can improve the response time for many operations, it can also lead to issues of consistency if two instances of an item of data are not identical. Consequently, applications that use caching effectively should be designed to cope with data that may be stale but that eventually becomes consistent.

Do not use Windows Azure Caching for code that executes on-premises as it will not improve the performance of your applications in this environment. In fact, it will likely slow your system down due to the network latency involved in connecting to the cache in the cloud. If you need to implement caching for on-premises applications, you should consider using Windows Server AppFabric Caching instead. For more information, see "Windows Server AppFabric Caching Features."

NoteBharath says:
Bharath Windows Azure Caching is primarily intended for code running in the cloud, such as web and worker roles, and to gain the maximum benefit you implement Windows Azure Caching in the same datacenter that hosts your code.

Provisioning and Sizing a Windows Azure Cache

Windows Azure Caching is a service that is maintained and managed by Microsoft; you do not have to install any additional software or implement any infrastructure within your organization to use it. An administrator can easily provision an instance of the Caching service by using the Windows Azure Management Portal. The portal enables an administrator to select the location of the Caching service and specify the resources available to the cache. You indicate the resources to provision by selecting the size of the cache. Windows Azure Caching supports a number of predefined cache sizes, ranging from 128MB up to 4GB. Note that the bigger the cache size the higher the monthly charge.

The size of the cache also determines a number of other quotas. The purpose of these quotas is to ensure fair usage of resources, and imposes limits on the number of cache reads and writes per hour, the available bandwidth per hour, and the number of concurrent connections; the bigger the cache, the more of these resources are available. For example, if you select a 128MB cache, you can currently perform up to 40,000 cache reads and writes, occupying up to 1,400MB of bandwidth (MB per hour), spanning up to 10 concurrent connections, per hour. If you select a 4GB cache you can perform up to 12,800,000 reads and writes, occupying 44,800 MB of bandwidth, and supporting 160 concurrent users each hour.

NoteNote:
The values specified here are correct at the time of writing, but these quotas are constantly under review and may be revised in the future. You can find information about the current production quota limits and prices at "Windows Azure Shared Caching FAQ."

You can create as many caches as your applications require, and they can be of different sizes. However, for maximum cost effectiveness you should carefully estimate the amount of cache memory your applications will require and the volume of activity that they will generate. You should also consider the lifetime of objects in the cache. By default, objects expire after 48 hours and will then be removed. You cannot change this expiration period for the cache as a whole, although you can override it on an object by object basis when you store them in the cache. However, be aware that the longer an object resides in cache the more likely it is to become inconsistent with the original data source (referred to as the "authoritative" source) from which it was populated.

To assess the amount of memory needed, for each type of object that you will be storing:

  1. Measure the size in bytes of a typical instance of the object (serialize objects by using the NetDataContractSerializer class and write them to a file),
  2. Add a small overhead (approximately 1%) to allow for the metadata that the Caching service associates with each object,
  3. Round this value up to the next nearest value of 1024 (the cache is allocated to objects in 1KB chunks),
  4. Multiply this value by the maximum number of instances that you anticipate caching.

Sum the results for each type of object to obtain the required cache size. Note that the Management Portal enables you to monitor the current and peak sizes of the cache, and you can change the size of a cache after you have created it without stopping and restarting any of your services. However, the change is not immediate and you can only request to resize the cache once a day. Also, you can increase the size of a cache without losing objects from the cache, but if you reduce the cache size some objects may be evicted.

You should also carefully consider the other elements of the cache quota, and if necessary select a bigger cache size even if you do not require the volume of memory indicated. For example, if you exceed the number of cache reads and writes permitted in an hour, any subsequent read and write operations will fail with an exception. Similarly, if you exceed the bandwidth quota, applications will receive an exception the next time they attempt to access the cache. If you reach the connection limit, your applications will not be able to establish any new connections until one or more existing connections are closed.

NoteMarkus says:
Markus Windows Azure Caching enables an application to pool connections. When connection pooling is configured, the same pool of connections is shared for a single application instance. Using connection pooling can improve the performance of applications that use the Caching service, but you should consider how this affects your total connection requirements based on the number of instances of your application that may be running concurrently. For more information, see "Understanding and Managing Connections in Windows Azure".

You are not restricted to using a single cache in an application. Each instance of the Windows Azure Caching service belongs to a service namespace, and you can create multiple service namespaces each with its own cache in the same datacenter. Each cache can have a different size, so you can partition your data according to a cache profile; small objects that are accessed infrequently can be held in a 128MB cache, while larger objects that are accessed constantly by a large number of concurrent instances of your applications can be held in a 2GB or 4GB cache.

Implementing Services that Share Data by Using Windows Azure Caching

The Windows Azure Caching service implements an in-memory cache, located on a cache server in a Windows Azure datacenter, which can be shared by multiple concurrent services. It is ideal for holding immutable or slowly changing data, such as a product catalog or a list of customer addresses. Copying this data from a database into a shared cache can help to reduce the load on the database as well as improving the response time of the applications that use this data. It also assists you in building highly scalable and resilient services that exhibit reduced affinity with the applications that invoke them. For example, an application may call an operation in a service implemented as a Windows Azure web role to retrieve information about a specific customer. If this information is copied to a shared cache, the same application can make subsequent requests to query and maintain this customer information without depending on these requests being directed to the same instance of the Windows Azure web role. If the number of client requests increases over time, new instances of the web role can be started up to handle them, and the system scales easily. Figure 4 illustrates this architecture, where an on-premises applications employs the services exposed by instances of a web role. The on-premises application can be directed to any instance of the web role, and the same cached data is still available.

Figure 4 - Using Windows Azure Caching to provide scalability

Figure 4
Using Windows Azure Caching to provide scalability

Web applications access a shared cache by using the Windows Azure Caching APIs. These APIs are optimized to support the cache-aside programming pattern; a web application can query the cache to find an object, and if the object is present it can be retrieved. If the object is not currently stored in the cache, the web application can retrieve the data for the object from the authoritative store (such as a SQL Azure database), construct the object using this data, and then store it in the cache.

NoteMarkus says:
Markus Objects you store in the cache must be serializable.

You can specify which cache to connect to either programmatically or by providing the connection information in a dataCacheClient section in the web application configuration file. You can generate the necessary client configuration information from the Management Portal, and then copy this information directly into the configuration file. For more information about configuring web applications to use Windows Azure Caching, see "How to: Configure a Cache Client using the Application Configuration File (Windows Azure Shared Caching)."

As described in the section "Provisioning and Sizing a Windows Azure Cache," an administrator specifies the resources available for caching data when the cache is created. If memory starts to run short, the Windows Azure Caching service will evict data on a least recently used basis. However, cached objects can also have their own independent lifetimes, and a developer can specify a period for caching an object when it is stored; when this time expires, the object is removed and its resources reclaimed.

NoteMarkus says:
Markus With the Windows Azure Caching service, your applications are not notified when an object is evicted from the cache or expires, so be warned.

For detailed information on using Windows Azure Caching APIs see "Developing for Windows Azure Shared Caching."

Updating Cached Data

Web applications can modify the objects held in cache, but be aware that if the cache is being shared, more than one instance of an application might attempt to update the same information; this is identical to the update problem that you meet in any shared data scenario. To assist with this situation, the Windows Azure Caching APIs support two modes for updating cached data:

If you are hosting multiple instances of the Windows Azure Caching service across different datacenters, the update problem becomes even more acute as you may need to synchronize a cache not only with the authoritative data source but also other caches located at different sites. Synchronization necessarily generates network traffic, which in turn is subject to the latency and occasionally unreliable nature of the Internet. In many cases, it may be preferable to update the authoritative data source directly, remove the data from the cache in the same datacenter as the web application, and let the cached data at each remaining site expire naturally, when it can be repopulated from the authoritative data source.

The logic that updates the authoritative data source should be composed in such a way as to minimize the chances of overwriting a modification made by another instance of the application, perhaps by including version information in the data and verifying that this version number has not changed when the update is performed.

The purpose of removing the data from the cache rather than simply updating it is to reduce the chance of losing changes made by other instances of the web application at other sites and to minimize the chances of introducing inconsistencies if the update to the authoritative data store is unsuccessful. The next time this data is required, a consistent version of the data will be read from the authoritative data store and copied to the cache.

If you require a more immediate update across sites, you can implement a custom solution by using Service Bus topics implementing a variation on the patterns described in the section "Replicating and Synchronizing Data Using Service Bus Topics and Subscriptions" in "Appendix A - Replicating, Distributing, and Synchronizing Data."

Both approaches are illustrated later in this appendix, in the section "Guidelines for Using Azure Caching."

NoteJana says:
Jana Incorporating Windows Azure Caching into a web application must be a conscious design decision as it directly affects the update logic of the application. To some extent you can hide this complexity and aid reusability by building the caching layer as a library and abstracting the code that retrieves and updates cached data, but you must still implement this logic somewhere.

The nature of the Windows Azure Caching service means that it is essential you incorporate comprehensive exception-handling and recovery logic into your web applications. For example:

Implementing a Local Cache

As well as the shared cache, you can configure a web application to create its own local cache. The purpose of a local cache is to optimize repeated read requests to cached data. A local cache resides in the memory of the application, and as such it is faster to access. It operates in tandem with the shared cache. If a local cache is enabled, when an application requests an object, the caching client library first checks to see whether this object is available locally. If it is, a reference to this object is returned immediately without contacting the shared cache. If the object is not found in the local cache, the caching client library fetches the data from the shared cache and then stores a copy of this object in the local cache. The application then references the object from the local cache. Of course, if the object is not found in the shared cache, then the application must retrieve the object from the authoritative data source instead.

Once an item has been cached locally, the local version of this item will continue to be used until it expires or is evicted from the cache. However, it is possible that another application may modify the data in the shared cache. In this case the application using the local cache will not see these changes until the local version of the item is removed from the local cache. Therefore, although using a local cache can dramatically improve the response time for an application, the local cache can very quickly become inconsistent if the information in the shared cache changes. For this reason you should configure the local cache to only store objects for a short time before refreshing them. If the data held in a shared cache is highly dynamic and consistency is important, you may find it preferable to use the shared cache rather than a local cache.

After an item has been copied to the local cache, the application can then access it by using the same Windows Azure Caching APIs and programming model that operate on a shared cache; the interactions with the local cache are completely transparent. For example, if the application modifies an item and puts the updated item back into the cache, the Windows Azure Caching APIs update the local cache and also the copy in the shared cache.

A local cache is not subject to the same resource quotas as the shared cache managed by the Windows Azure Caching service. You specify the maximum number of objects that the cache can hold when it is created, and the storage for the cache is allocated directly from the memory available to the application.

NoteMarkus says:
Markus You enable local caching by populating the LocalCacheProperties member of the DataCacheFactoryConfiguration object that you use to manage your cache client configuration. You can perform this task programmatically or declaratively in the application configuration file. You can specify the size of the cache and the default expiration period for cached items. For more information, see the topic "Enable Windows Server AppFabric Local Cache (XML)."

Caching Web Application Session State

The Windows Azure Caching service enables you to use the DistributedCacheSessionStateStoreProvider session state provider for ASP.NET web applications and services. With this provider, you can store session state in a Windows Azure cache. Using a Windows Azure cache to hold session state gives you several advantages:

You can configure this provider either through code or by using the application configuration file; you can generate the configuration information by using the Management Portal and copy this information directly into the configuration file. For more information, see "How to: Configure the ASP.NET Session State Provider for Windows Azure Caching."

Once the provider is configured, you access it programmatically through the Session object, employing the same code as an ordinary ASP.NET web application; you do not need to invoke the Windows Azure Caching APIs.

Caching HTML Output

The DistributedCacheOutputCacheProvider class available for the Windows Azure Caching service implements output caching for web applications. Using this provider, you can build scalable web applications that take advantage of the Windows Azure Caching service for caching the HTTP responses that they generate for web pages returned to client applications, and this cache can be shared by multiple instances of an application. This provider has several advantages over the regular per process output cache, including:

Again, you can generate the information for configuring this provider by using the Management Portal and copy this information directly into the application configuration file. For more information, see "How to: Configure the ASP.NET Output Cache Provider for Windows Azure Caching."

Like the DistributedCacheSessionStateStoreProvider class, the DistributedCacheOutputCacheProvider class is completely transparent; if your application previously employed output caching, you do not have to make any changes to your code.

Guidelines for Using Windows Azure Caching

The following scenarios describe some common scenarios for using Windows Azure Caching:

Limitations of Windows Azure Caching

The features provided by the Windows Azure Caching service are very similar to those of Windows Server AppFabric Caching; they share the same programmatic APIs and configuration methods. However the Windows Azure implementation provides only a subset of the features available to the Windows Server version. Currently, the Windows Azure Caching service has the following limitations compared to Windows Server AppFabric Caching:

NoteNote:
Windows Azure Caching may remove some of these limitations in future releases.

You should also note that a Windows Azure cache automatically benefits from the reliability and scalability features of Windows Azure; you do not have to manage these aspects yourself. Consequently, many of the high availability features of Windows Server AppFabric Caching are not available because they are not required in the Windows Azure environment.

For more information about the differences between Windows Azure Caching and Windows Server AppFabric Caching, see the topic "Differences Between Caching On-Premises and in the Cloud."

Guidelines for Securing Windows Azure Caching

You access a Windows Azure cache through an instance of the Windows Azure Caching service. You generate an instance of the Windows Azure Caching service by using the Management Portal and specifying a new service namespace for the Caching service. The Caching service is deployed to a datacenter in the cloud, and has endpoints with URLs that are based on the name of the service namespace with the suffix ".cache.windows.net". Your applications connect to the Caching service using these URLs. The Caching service exposes endpoints that support basic HTTP connectivity (via port 22233) as well as SSL (via port 22243).

All connection requests from an application to the Windows Azure Caching service are authenticated and authorized by using ACS. To connect to the Caching service, an application must provide the appropriate authentication token.

NoteBharath says:
Bharath Only web applications and services running in the cloud need to be provided with the authentication token for connecting to the Windows Azure Caching service as these are the only items that should connect to the cache. Utilizing a Windows Azure cache from code running externally to the datacenter provides little benefit other than for testing when using the Windows Azure compute emulator, and is not a supported scenario for production purposes.

More Information

All links in this book are accessible from the book's online bibliography available at: http://msdn.microsoft.com/en-us/library/hh968447.aspx.