The first stage in moving parts of the Orders system to the cloud as elements of a hybrid application required the designers at Trey Research to consider how to deploy these pieces in Windows Azure™ technology platform. Windows Azure offers several options for deployment of application functionality, and a wide range of associated services that Trey Research can take advantage of when designing and building hybrid applications.
In this chapter, you will see how Trey Research addressed the challenges associated with deploying the key elements of the Orders application to the cloud, and how the designers integrated the application with the services provided by Windows Azure and the SQL Azure™ technology platform.
Scenario and Context
In the original implementation of the Orders application, the components and services it uses ran on-premises and accessed data stored in local SQL Server databases in Trey Research's datacenter. You saw the architecture and a description of the original on-premises system in Chapter 1, "The Trey Research Scenario." Trey Research had to decide how to segregate the functionality, the types of Windows Azure roles to use, and how this might architecture affects the security, performance, and reliability of the application.
In addition, the designers had to consider where and how to host the data used by the application when some parts of the application are located remotely and communication must cross the Internet, and how to maintain the ability to produce business reports from that data.
When they examined the existing Orders application with a view to moving some parts to Windows Azure, it soon became clear that the management and reporting part of the application, which does not need to scale to the same extent as the public website, should remain on premises. This allowed Trey Research to more closely control the aspects of the application that require additional security and which, for logistical reasons, they felt would be better kept within their own datacenter. However, Trey Research wished to make some non-confidential elements of the reporting data available to trusted partners for use in their own systems.
The public section of the application could easily be deployed to the cloud as it was already effectively a separate application, and is the part of the application that will be required to scale most over time to meet elastic demand. This allowed Trey Research to take full advantage of the cloud in terms of reliability, availability, security, lower running costs, reduced requirements for on-premises infrastructure, and the capability to scale up and down at short notice to meet peaks in demand.
There are other advantages to hosting in Windows Azure that served to make a strong case for moving the public parts of the Orders application to the cloud. These include the ability to deploy it to multiple datacenters in different geographical locations to provide better response times and to maximize availability for customers. By using Windows Azure Traffic Manager, requests to the application are automatically routed to the instance that will provide the best user experience. Traffic Manager also handles failed instances by rerouting requests to other instances.
In addition, Trey Research were able to take advantage of the built-in distributed data caching feature for transient data used by the public website, the claims-based authentication service for easily implementing federated authentication, the connectivity features for secure communication and service access across the cloud/on-premises boundary, the capabilities for data synchronization, a comprehensive cloud-based reporting system, and the availability of third party components and frameworks to simplify development.
Markus says: | |
---|---|
Taking advantage of available components, services, frameworks, and features designed and optimized for the cloud simplifies both the design and development of cloud-based applications. |
Figure 1 shows a high-level view of the way that Trey Research chose to segregate the parts of the application across the cloud and on-premises boundary.
In this chapter you will see how the designers at Trey Research chose where to locate the data the application uses, how they implemented a synchronization mechanism that ensures that the relevant data is available and consistent in all of the locations where it is required, and how they maintain comprehensive business intelligence reporting capabilities. These decisions required the designers to consider the options available, and the tradeoffs that apply to each one.
Deploying the Application and Data to the Cloud
The Orders application is a website, and so the designers at Trey Research realized that this could easily be deployed in Windows Azure as a web role. Deploying multiple instances of the web role allows the website to scale to meet demand, and ensures that it provides the availability and reliability that Trey Research requires. Background processing tasks, which occur after a customer places an order, are handed off to a worker role. Trey Research can deploy multiple instances of the worker role to handle the varying load as customers place orders in the website.
Markus says: | |
---|---|
You write new web applications or adapt existing web applications for deployment to Windows Azure in a very similar manner to that you would follow if you were building items for local deployment in your own datacenter. However, there are some aspects that differ, such as session state management, data storage, and configuration. |
The Orders website requires access to several items of data as it runs. This data includes the list of products that customers can order, the list of customers so that the application can authenticate visitors and access information about them, the orders that customers place at the website, and auditing and runtime logging information. The designers at Trey Research needed to decide where and how to locate each of these items, and also identify the appropriate storage mechanism for this data.
Choosing the Location for Data
All elements of a hybrid application, whether they are located on-premises, in the cloud, or at a partner location, are likely to need to access data. A fundamental part of the design of a hybrid application is locating this data in the appropriate places to maximize efficiency and performance, while maintaining security and supporting any replication and synchronization requirements. Typically, data should be located as close as possible to the applications and components that use it. However, this is not always advisable, or possible, depending on individual circumstances.
The major decision is whether to locate data remotely (such as in the cloud or at a partner location), or to keep it on-premises. The Orders application uses four types of data:
- Customer information, including sensitive data such as credit limits and payment information. This includes personally identifiable information (PII) and must be protected to the highest extent possible.
- Product information such as the product catalog, prices, and details. Trey Research manufactures all products to order, and so there is no stock level data.
- Order information, including full details of orders placed by customers and delivery information.
- Audit log information, such as events and exceptions raised by the application and details of orders over a total value of $10,000. This data may contain sensitive information that must be fully secured against access by non-administrative staff.
The designers at Trey Research considered three options for locating the data used by the Orders application. They could deploy all of the data in the cloud, keep all of the data on-premises, or deploy some in the cloud while the rest remains on-premises.
Deploy All of the Data in the Cloud
Deploying all of the data in the cloud so that it is close to the Orders application can help to maximize performance and minimize response times, and removes the requirement to synchronize data between cloud and on-premises locations. It also allows Trey Research to take advantage of the scalability and performance of either Windows Azure storage or SQL Azure, both of which provide reliable, fast, and efficient data access for the application and make it easy to expand storage availability as required.
However, deploying all of the data in the cloud would mean head-office applications that require access to this data must do so over the Internet. This could cause users in the head office to encounter delays and failed connections due to occasional Internet networking and performance issues, and additional costs would be incurred for access to the data from the on-premises applications. In addition, the storage costs for deploying large volumes of data or multiple databases could be an issue, and there is still likely to be a requirement to synchronize the data between these deployments if the application is located in more than one datacenter.
Keep All Data On-premises
Keeping all of the data on-premises means that it can be secured and managed by Trey Research administrators and operations staff more easily, especially if most of the update operations are done by on-premises staff and other on-premises applications within the organization. This approach also allows Trey Research to ensure they comply with legal or regulatory limitations on the location and security of sensitive information. In addition, there is no requirement to migrate or deploy data to a remote location, and other operations such as backing up data are easier.
However, keeping all of the data on-premises means that remote applications and services in the cloud or at partner locations must access the data over the Internet, although this can be mitigated to some extend by the judicious use of caching. The designers at Trey Research also considered whether it would be possible to implement the required business logic so that it worked securely and reliably when remote applications and services must perform updates across the Internet in multiple databases.
Jana says: | |
---|---|
Accessing data held on-premises from a cloud-hosted application is not usually the best approach due to the inherent network latency and reliability of the Internet. If you decide to follow this approach, you must consider using a robust caching mechanism such as Windows Azure Caching to minimize the impact of network issues. |
Deploy Some of the Data in the Cloud
Deploying some of the data in the cloud and keeping the remainder on-premises provides several advantages. For example, data for applications and services that require fast and reliable access can be located in the cloud, close to the application or service that uses it, whereas data that is mostly accessed by head office applications can remain on-premises to provide fast and reliable access for these applications. In addition, data that is subject to legal or regulatory limitations regarding its storage location, or requires specific security mechanisms to be in place, can remain on-premises. Finally, data that does not need to scale can remain on-premises, saving hosting costs, whereas data that must scale can be located in Windows Azure storage or SQL Azure to take advantage of the scalability these services offer.
However, deploying some of the data in the cloud means that, where it is used in both cloud-hosted or on-premises applications, it will still need to be accessed across the Internet. A suitably secure and reliable connectivity mechanism will be required, and a data replication and synchronization solution will be necessary to ensure that data in all locations is consistent.
How Trey Research Chose the Location for Deploying Data
After considering the options for where to deploy data, Trey Research made the following decisions for locating the information used by the Orders application.
Customer Data
Customer information is maintained by Trey Research's own operations staff in conjunction with the existing on-premises accounting system that Trey Research uses within its wider organization. Trey Research requires customers to register through the head office, and operators add customers to the on-premises database. Using the Orders application, it is planned that customers will be able modify some of their own information (this functionality is not yet implemented), but the application will not allow them to modify critical identity or other secure data. Customer data is likely to be relatively static and not change much over time.
Trey Research decided to keep the master Customer database on-premises to maximize security, and to maintain the existing capabilities of all the on-premises applications to interact with the data efficiently. However, customer data is also required by the Orders website to authenticate visitors and to accept orders from them. Therefore, to maximize performance and reliability, Trey Research decided to locate a replica of the customer data in the cloud, close to the Orders website.
This means that a bidirectional synchronization mechanism is required to ensure that updates to the customer data made by on-premises operators are replicated to all datacenters that host the Orders application, and changes made in the Orders application by customers to certain parts of their own data are replicated back to the master copy of the data held on-premises and out to the SQL Azure databases located in other datacenters.
Product Data
Product information is also maintained by Trey Research's operations staff. This data can only be updated on-premises in conjunction with the existing on-premises manufacturing processes and parts catalogs that Trey Research uses within its wider organization. Because there is no stock level information (all products are manufactured on-demand), the Product data is relatively static.
Trey Research decided to keep the master Product data on-premises to maintain the existing capabilities of all the on-premises applications to interact with the data efficiently. However, to maximize performance and reliability, Trey Research decided to locate a replica of some fields of the Product data (just the data required to list products, show product details, and accept orders) in the cloud, close to the Orders application. This means that a unidirectional synchronization mechanism is required to ensure that updates to the Product data made by on-premises operators are replicated to all datacenters that host the Orders application.
Order Data
Order information is generated by the Orders application running in the cloud, and cannot be edited elsewhere. The Orders application also reads Order data when displaying lists of current orders and delivery information to users. Unlike Customer and Product data, which is relatively static, Order data is highly dynamic because it changes as customer place orders and as they are shipped by the transport partners.
Trey Research decided that there was no requirement to locate Order data on-premises. Instead, Order data is stored only in the cloud, close to the Orders application. However, when the Orders application is deployed to more than one datacenter, bi-directional synchronization of the order data between datacenters ensures that customers see their order information if, due to an application failure (or when a user moves to a new geographical location), they are redirected to a different datacenter. The only issue with this decision is that Trey Research will no longer be able to use SQL Server Reporting Services to create business intelligence reports on the data directly. You will see how Trey Research resolved this issue later in this chapter, in the section "Choosing a Reporting Solution."
Audit Log Data
Audit log information is generated by the Orders application in response to events and exceptions raised by the application, and for orders over a total value of $10,000. It is also generated by other on-premises applications within Trey Research's organization, and so the Audit Log database is a complete repository for all application management and monitoring facilities.
Trey Research decided that, because the most intensive access to this data is from monitoring tools and administrative management applications, the data should remain on-premises. In addition, government regulations on the sale of some high-tech products that Trey Research manufactures means Trey Research must maintain full and accurate records of such sales and store these records locally. Keeping the Audit Log data, which may contain sensitive information about the application, on-premises also helps to ensure that it is fully secured within Trey Research's domain against access by unauthorized parties.
Choosing the Data Storage Mechanism
Having decided that some of the data used by the Orders application will be hosted in Windows Azure, the designers at Trey Research needed to choose a suitable mechanism for storing this data in the cloud. The most common options are Windows Azure storage, SQL Azure or another database system, or a custom repository.
Windows Azure Storage
Windows Azure storage provides blob storage, table storage, and queues. Queues are typically used for passing information between roles and services, and are not designed for use as a persistent storage mechanism. However, Trey Research could use table storage or blob storage. Both of these are cost-effective ways of storing data.
Blob storage is ideal for storing unstructured information such as images, files, and other resources. Table storage is best suited to structured information. Table storage is very flexible and can be very efficient, especially if the table structure is designed to maximize query performance. It also supports geographical replication, so that access is fast and efficient from different client locations. Table storage is significantly cheaper than using a SQL Azure database.
However, table storage does not support the familiar SQL-based techniques for reading and writing data, and some of the standard relational database data types. Data is stored as collections of entities, which are similar to rows but each has a primary key and a set of properties. These properties consist of a name and a series of typed-value pairs. The designers at Trey Research realized that migrating an existing application that uses a SQL database to the cloud, and deciding to use Windows Azure table storage, meant that they would need to redesign their data model and rewrite some of the data access code. This would add cost and time to the migration process.
In addition, Windows Azure table storage does not support the concept of database transactions, although it does provide transacted access to a single table. Finally, data cannot be directly imported from a relational database system such as SQL Server into table storage. Trey Research would need to create or source tools to perform the translation and upload the data.
Note: |
---|
For more information about using Windows Azure table storage, see the section "Storing Business Expense Data in Windows Azure Table Storage" in Chapter 5 of the guide "Moving Applications to the Cloud." |
SQL Azure
SQL Azure is a high-performance database service that fully supports SQL-based operations, and can be used to create relational databases in the cloud. It is implemented by SQL Server instances installed in Microsoft datacenters.
SQL Azure offers much of the core functionality of a local SQL Server installation, and it delivers data to the application using the familiar SQL Server Tabular Data Stream (TDS) protocol. This architecture enables you to use the same .NET Framework data providers (such as System.Data.SqlClient) to connect to the database, and T-SQL to access and manipulate the data. SQL Azure is also compatible with existing connectivity APIs, such as the Entity Framework (EF), ADO.NET, and Open Database Connectivity (ODBC). Data can be updated using database transactions to ensure consistency.
These advantages mean that developers at Trey Research would not have to make major changes to the application code, and administrators could quickly and easily deploy the data to SQL Azure without needing to change the schema of the tables. Trey Research administrators and operators can manage SQL Azure databases through the Windows Azure Management Portal, and by using familiar tools such as SQL Server Management Studio and the Visual Studio database tools. A range of other tools for activities such as moving and migrating data, as well as command line tools for deployment and administration, are also available.
In addition, data synchronization across cloud-hosted and on-premises databases is easy to achieve through the Windows Azure Data Sync service or the Data Sync APIs. SQL Azure supports business intelligence reporting with the SQL Azure Reporting Service.
However, the designers at Trey Research also needed to consider that, while SQL Azure is very similar to SQL Server, certain concepts such as server-level controls or physical file management do not apply in an auto-managed environment such as SQL Azure. In addition, the subscription costs for SQL Azure are higher than those of Windows Azure storage.
Alternative Database System or Custom Repository
If your application currently uses a relational database system, or utilizes a custom repository to store its data, you may be able to migrate the data to SQL Azure easily—depending on the existing format of the data. Alternatively, if you use a database system other than SQL Server (such as Mongo DB, see http://www.mongodb.org/), you might be able to run this database system in the cloud using the Windows Azure worker role or VM role.
Using an existing database system or custom repository that already provides data for your application means that you will probably be able to use the same data access code as you employed on-premises. This is an advantage if developers are familiar with the mechanism you choose, and it can reduce the transition time and effort of learning a new system.
However, using an alternative database system or custom repository means that you must maintain this database or repository yourself. For example, you must install updated versions of the database management software or debug your own custom code. You may also have difficulty importing data or moving data to another data storage mechanism in the future.
How Trey Research Chose a Storage Mechanism for Data
Trey Research uses SQL Server to store data in their on-premises applications, including the original Orders application. The data formats and types, and the data access code, are all designed to work with SQL Server. Therefore, it made sense for Trey Research to choose SQL Azure as the data storage mechanism for the hybrid version of the Orders application. The additional cost compared to using Windows Azure table storage is partly mitigated by the savings in schema redesign and code development costs.
In addition, Trey Research wanted to be able to use database transactions and perform complex queries when working with data. Implementing code to achieve the equivalent functionality using Windows Azure table storage would require additional development time and incur subsequent additional costs. Administrators at Trey Research are also familiar with SQL Server, including the tools used to manage data, and are comfortable using systems based on SQL Server so working with SQL Azure does not require them to learn new paradigms.
Encrypting Data Stored in Windows Azure Storage and Databases
The designers at Trey Research realized that when moving data to the cloud, they must consider the level of protection required for that data, irrespective of the selected storage mechanism. Sensitive data, such as customers' passwords and credit card numbers, and PII such as addresses and telephone numbers, typically require higher levels of protection than data such as product lists.
At the time of writing, neither Windows Azure storage nor SQL Azure support built-in data encryption mechanisms. This means that the application is responsible for encrypting or decrypting sensitive data that requires an additional level of protection. Trey Research achieves this by using the standard cryptography algorithms exposed by the .NET Framework, or with other code libraries.
Note: |
---|
For information about encrypting data in Windows Azure, see "Crypto Services and Data Security in Windows Azure" in MSDN® Magazine and "Encrypting Data in Windows Azure Storage." For details of the security features of SQL Azure, see "Security Guidelines and Limitations (SQL Azure Database)." |
Synchronizing Data across Cloud and On-Premises Locations
The architecture Trey Research chose for the Orders application has some data located in the cloud in SQL Azure, and some data located on-premises. This means that the designers at Trey Research must consider how to synchronize data across these locations to ensure it is consistent.
Choosing a Data Synchronization Solution
The choice of data synchronization solution depends on both the type of data stores that hold the data and the requirements for consistency. For example, if data must always be consistent across different locations, the solution must detect and replicate changes to data in each location as soon as they occur. If the application can work successfully when data is eventually consistent, but may be stale for short periods, a scheduled synchronization process may be sufficient. The following sections of this chapter describe the options that Trey Research considered for synchronizing data in the Orders application.
SQL Azure Data Sync
If data is deployed to SQL Azure, the natural solution for synchronizing this data is to use SQL Azure Data Sync. This is a service that can synchronize data between on-premises SQL Server databases and one or more SQL Azure databases hosted in the cloud. SQL Azure Data Sync offers a variety of options for unidirectional and bi-directional synchronization.
Using SQL Azure Data Sync would mean that the developers at Trey Research wouldn't need to write any custom code because synchronization is configured and managed through the Windows Azure web portal. This helps to reduce the cost and time required to implement a solution compared to building a custom solution.
However, SQL Azure Data Sync works with only SQL Server and SQL Azure databases; it cannot be used if data is stored in Windows Azure storage or another database system. In addition, SQL Azure Data Sync imposes some restrictions on column data types and nullability that may necessitate changes to existing database schemas. SQL Azure Data Sync handles conflicting changes made in different databases by using one of a small number of predefined policies. It isn't possible to customize these policies, and SQL Azure Data Sync does not provide synchronization events that you can use to implement your own mechanism.
The designers at Trey Research also realized that in some scenarios synchronization requires two passes to complete; the data is moved to a hub database first (which may be one of the existing operational databases) and then to client databases. This means that, when there is more than one database synchronizing from the hub database, some instances of the data may be stale until the second synchronization pass occurs. However, when simply synchronizing one on-premises database to the SQL Azure hub database, all updates are applied during a single pass.
Note: |
---|
See "Appendix A - Replicating, Distributing, and Synchronizing Data" for more information about using SQL Azure Data Sync. |
Microsoft Sync Framework
SQL Azure Data Sync uses the components of the Microsoft Sync Framework to perform data synchronization. The Sync Framework is a comprehensive synchronization platform that supports any data type, any data store, any transfer protocol, and any network topology. It is not confined to use with just SQL Server and SQL Azure databases.
If the developers at Trey Research needed more control over the synchronization process, they could use the components of the Sync Framework SDK directly in code. This has the advantage that the application could react to events, such as data being changed, and initiate synchronization. The application could also handle events occurring during the synchronization process to manage conflicts and errors, or to provide more traceability. Of course, it will also mean that the developers would have to write additional code to control the synchronization process, which would incur additional cost and time compared to using the SQL Azure Data Sync service.
Note: |
---|
For more information about the Sync Framework SDK, see "Microsoft Sync Framework Developer Center." |
A Custom or Third Party Synchronization Solution
If Trey Research decided not to use SQL Azure Data Sync or the Microsoft Sync Framework, the designers could have considered implementing a custom or third party solution for synchronizing data. In particular, where there are special requirements for synchronizing or replicating data, a custom mechanism might be a better choice than an off the shelf solution. For example, if Trey Research needed to carry out specific types of synchronization not supported by available third-party solutions or services, a custom mechanism that passes messages between services located on-premises and at each datacenter using Windows Azure Service Bus brokered messaging could have been be a good choice.
Messaging solutions are flexible and can be used across different types of data repository because the service that receives update messages can apply the update operations in the repository using the appropriate methods. Message-based replication and synchronization solutions are particularly suited to performing real-time updates, but this was not a requirement of the Orders application.
In addition, messaging solutions can expose more information about the synchronization process as it proceeds; for example, allowing developers to trace each data modification and handle conflicts or errors in an appropriate way. It is also possible to implement a solution that follows the principles of the Command Query Responsibility Segregation (CQRS) pattern by separating the queries that extract data from the commands that update the target data repository.
However, if you cannot locate a third party solution that provides the required features and can interface with your existing data stores, and you decide to create a custom solution, implementing, testing, and debugging this solution is likely to incur additional costs and require additional development time.
Note: |
---|
See "Appendix A - Replicating, Distributing, and Synchronizing Data" for more information about creating a custom message-based data synchronization solution. |
How Trey Research Chose the Data Synchronization Solution
The designers at Trey Research decided to use SQL Azure Data Sync as the replication and synchronization solution for the Orders application. All of the data is stored in either SQL Server on-premises or SQL Azure in the cloud, and so SQL Azure Data Sync will be able to access and synchronize all of the data as required. The saving in development cost and time compared to a custom solution compensated to some extent for the costs of using the SQL Azure Data Sync service.
How Trey Research Uses SQL Azure Data Sync
Trey Research stores information about products, customers, and the orders that these customers have placed. Trey Research uses a combination of SQL Server running on-premises and SQL Azure hosted at each datacenter to manage the data required by the Orders application. Therefore Trey Research decided to implement data replication and synchronization in the Orders application.
This section is provided for information only. For simplicity, the sample solution is deployed to a single datacenter and, therefore, is not configured to replicate and synchronize data across multiple datacenters.
The different types of information that Trey Research synchronizes are managed and maintained in different ways, specifically:
- Order data is maintained exclusively in the cloud by the Orders application using SQL Azure, and is synchronized between datacenters. This information is not propagated back to the on-premises database.
- Product information is maintained exclusively on-premises by using SQL Server, but the details required for placing orders are copied to each SQL Azure database at each datacenter on a periodic basis.
- New customers are registered on-premises and their details are added to the SQL Server database held at Head Office. These details are replicated out to SQL Azure at each datacenter, enabling a customer to log in and access the Orders application without the system requiring recourse to the Head Office. In the future, once an account has been created, the Orders application may enable certain customer information to be changed by a customer without requiring the intervention of the Head Office, and these changes will be made to the SQL Azure database located in whichever datacenter the customer is currently connected to (this functionality is not currently implemented, but Trey Research wished to deploy the Customers data to allow for this eventuality). These changes will then be subsequently propagated back to the Head Office, and also replicated out to the other datacenters.
Figure 2 shows the solution Trey Research adopted.
In this solution, the Product data is synchronized one way, from the on-premises database to the cloud. The Orders data is replicated bidirectionally between datacenters. The Customer data is also replicated bidirectionally, but including the on-premises database as well as those in the datacenters.
Figure 3 shows the physical implementation of these approaches based on SQL Azure Data Sync. This implementation uses three sync groups; each sync group defines a sync dataset and conflict resolution policy for each type of data (as described above, there are two overlapping sync groups for replicating customer details). The SQL Azure databases located in the US North Data Center also act as the synchronization hubs. This is the nearest datacenter to the head office (the Trey Research head office is located in Illinois), so selecting this location helps to reduce the network latency when synchronizing with the on-premises database.
The following table summarizes the configuration of each sync group implemented by Trey Research.
Note: |
---|
The information for each customer spans two tables; the Customer table which contains the public information about a customer, and the ACSIdentity table which contains the ACS tokens that identify each customer. This is required because, if more than one ACS instance is used to authenticate customers, some identity providers will return a different ACS user identifier from each instance. Therefore, the database must be able to associate more than one user ACS identifier with each customer record. However, the sample solution provided with this guide only implements a single ACS identity for each user, stored in the UserName column in the Customer table in the TreyResearch database. The sample solution does not include the ACSIdentity table because ACS identifiers are stored in the UserName column in the Customer table. See Chapter 3, "Authenticating Users in the Orders Application," for more information about how Trey Research uses ACS to authenticate users. |
Implementing a Reporting Solution for Cloud-Hosted Data
In some cases, moving functionality to the cloud will preclude you from using existing services. For example, in the original on-premises version of the Orders application, Trey Research used SQL Server to store all corporate data, and generated business intelligence reports by using SQL Server Reporting Services. However, when Trey Research moved the source data (the Orders database) to SQL Azure, the designers needed to consider whether running a reporting system on the on-premises network was still feasible. It is possible to run an on-premises reporting system and connect to a SQL Azure database over the Internet, but this approach is likely to consume large amounts of bandwidth and provide poor performance.
Choosing a Reporting Solution
The designers at Trey Research decided that they would need to find a better solution for creating business intelligence reports from the cloud-hosted Orders database. The following sections of this chapter describe the options for business intelligence reporting that Trey Research considered.
SQL Server Reporting Services
SQL Server Reporting Services is a feature of SQL Server that allows you to create attractive and comprehensive business intelligence reports from data stored in the database tables, or from a variety of other data sources. If you use SQL Server to store corporate information, SQL Server Reporting Services provides an easy-to-use solution for creating reports.
SQL Server Reporting Services can read data from relational, multidimensional, XML, and custom data sources, and generate reports in a variety of formats including web-oriented, page-oriented, and desktop application formats. It also supports publishing reports directly to a variety of targets, including a report server, SharePoint® services, file shares, internal archive stores, and Office applications.
SQL Server Reporting Services must connect to the data source to analyze the data. If the data source is remote to the application that uses the report, the process may generate considerable network traffic and may require a service that exposes the data for SQL Server Reporting Services to access. This may impact the security of the data source if it is located remotely from the application that uses the report.
SQL Azure Reporting Service
The SQL Azure Reporting Service is a service exposed by Windows Azure that can generate a range of customized business intelligence reports in common web and Microsoft Office-ready formats. If you use SQL Azure to store corporate information, the SQL Azure Reporting Service provides an easy-to-use solution for creating reports.
The SQL Azure Reporting Service runs in the same datacenter as the SQL Azure database, and so network traffic between SQL Azure Reporting Service and the application that displays the report is minimized. This means that it is likely to provide much faster report generation and better performance than connecting from an on-premises reporting application to the SQL Azure database.
However, the SQL Azure Reporting Service is a chargeable service and so incurs subscription costs, though the consequential reduction in data transfer costs compared to a reporting system located on-premises can help to compensate for this. It also avoids the high initial cost of an on-premises reporting system for organizations that do not already have such a system. Also consider that the SQL Azure Reporting Service offers lower interactivity and reduced capabilities compared to SQL Server Reporting Services and other high-end reporting solutions, although the variety of report formats is usually sufficient for the vast majority of requirements.
Poe says: | |
---|---|
Wide-ranging and up to date information is vital in all companies for managing investment, planning resource usage, and monitoring business performance. The SQL Azure Reporting Service extends these capabilities to data hosted in SQL Azure. |
A Custom or Third Party Reporting Solution
It is possible to use any reporting package or create your own custom data analysis and reporting solution for a hybrid application provided that the solution you choose can access the data stored in your application's database or repository. Some organizations may require a custom solution that integrates with other applications and services, or wish to continue using an existing custom or third party application to create business intelligence reports.
Custom or third party reporting solutions may be closely tailored to the organization's reporting requirements, and this focus on specific areas of interest can provide faster report generation and overall performance compared to more generic solutions. Specially tailored third party solutions may be more appropriate for specialist types of applications, and may cost less than a more generalized solution.
Using an existing reporting solution can reduce the cost of migrating an application to the cloud. However, the reporting solution must be able to connect to the cloud-based data source without compromising the security of the data; this may require developers to create additional services to expose data. In addition, when the data source is remote to the application that uses the report, the process may generate considerable network traffic. Finally, an existing or third party solution may not offer the required variety of formats, or equivalent functionality, compared to SQL Server Reporting Services or the SQL Azure Reporting Service.
How Trey Research Chose the Reporting Solution
When Trey Research moved the Orders application to the cloud, the designers chose to locate the data generated by the application when customers place orders in SQL Azure. Before the move, Trey Research used SQL Server Reporting Services to generate business information from the Orders database.
The designers at Trey Research realized that one solution when the source data is located remotely would be to download all the orders data to an on-premises database, and continue to use SQL Server Reporting Services to analyze it. However, unless data synchronization occurs on a scheduled basis, which will incur additional cost, the data transfer operation will result in longer waiting times while reports are generated. This approach could also cause considerable network traffic over the Internet as data would be repeatedly queried to build the reports.
Instead, after moving the data to SQL Azure, it made more sense to adopt the business intelligence capabilities of the SQL Azure Reporting Service. This approach minimizes network traffic over the Internet, ensures that the most recent data is included in the reports without incurring additional delays, and can still generate the reporting information in a variety of formats.
How Trey Research Uses the SQL Azure Reporting Service
An on-premises reporting service application uses SQL Azure Reporting Service to collate the reports that the management applications running on-premises use. Trey Research extended the usefulness of the reports by combining the raw data available from SQL Azure Reporting with data streams exposed through Windows Azure Marketplace.
Figure 4 shows the architecture Trey Research implemented.
Note: |
---|
For simplicity of installation, the example application for this guide does not include an implementation of the SQL Azure Reporting Service. For more information about the SQL Azure Reporting Service, see "Business Analytics." For more information about incorporating external data feeds into your reports, see "One-Stop Shop for Premium Data and Applications." |
How Trey Research Makes Reporting Data Available to External Partners
Trey Research also decided to expose the report data to specific users who access the on-premises Reporting Service over the Internet through Service Bus Relay.
Trey Research chose to implement a trial version of this functionality while the organization gathered more details on the type of information that external partners should be able to view without intruding on the commercial confidentiality of the business. Therefore, in this first version, Trey Research simply published the total value of goods sold, broken down by reporting quarter or by region, as defined by the service contract available in the IOrdersStatistics.cs file in the Services folder of the HeadOffice project:
[ServiceContract]
public interface IOrdersStatistics
{
[OperationContract]
double SalesByQuarter(int quarter);
[OperationContract]
double SalesByRegion(string region);
}
Subsequent versions of this service may provide a more detailed breakdown of the sales data.
The HeadOffice application, which runs on-premises within Trey Research, hosts a WCF service called OrderStatistics that implements the SalesByQuarter and SalesByRegion operations. These operations simply retrieve the requested data from the underlying database and pass it back as the return value. The implementation of this service is available in the OrdersStatistics.cs file in the Services folder of the HeadOffice project.
The code that initiates the service is located in the OpenServiceHost method in the Global.asax.cs file also in the HeadOffice project.
Note: |
---|
The technique used by the sample application to start the OrderStatistics service is not necessarily applicable to all web applications, and only works in the sample application because the user is expected to explicitly start the web application that hosts the service. In other situations, it may be preferable to utilize the Auto-Start feature of Windows Server and IIS to initialize the service. For more information, see the topic "Auto-Start Feature." |
The service was made available to external partners and users through Windows Azure Service Bus Relay, using the service path Services/RelayedOrdersStatistics, and the application publishes the service through a TCP endpoint using a netTcpRelayBinding binding. The connection to the Service Bus is secured through ACS by using an authentication token; Chapter 3, "Authenticating Users in the Orders Application," provides more information about configuring ACS, and the section "Guidelines for Securing Windows Azure Service Bus Relay" in "Appendix C - Implementing Cross-Boundary Communication" includes guidance for authenticating and authorizing partners connecting to Service Bus Relay. The details of the Service Bus namespace, the service path, the security credentials, and the service configuration are stored in the web.config file of the project. Notice that the web application connects to the Service Bus by using the identity named headoffice; this identity is granted the privileges associated with the Listen claim, enabling the application to accept incoming requests but preventing it from performing other operations such as sending requests:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
...
<appSettings>
...
<!-- ServiceBus config-->
<add key="serviceBusNamespace"
value="treyresearchscenario" />
<add key="UriScheme" value="sb" />
<add key="RelayServicePath"
value="Services/RelayedOrdersStatistics" />
...
</appSettings>
...
<system.serviceModel>
<services>
<service name="HeadOffice.Services.OrdersStatistics">
<endpoint behaviorConfiguration=
"SharedSecretBehavior"
binding="netTcpRelayBinding"
contract=
"HeadOffice.Services.IOrdersStatistics"
name="RelayEndpoint"/>
</service>
</services>
<behaviors>
<endpointBehaviors>
<behavior name="SharedSecretBehavior">
<transportClientEndpointBehavior
credentialType="SharedSecret">
<clientCredentials>
<sharedSecret issuerName="headoffice"
issuerSecret="<data omitted>" />
</clientCredentials>
</transportClientEndpointBehavior>
<serviceRegistrySettings
discoveryMode="Public"
displayName=
"RelayedOrdersStatistics_Service"/>
</behavior>
</endpointBehaviors>
</behaviors>
...
</system.serviceModel>
</configuration>
Trey Research also built a simple command-line application to test the connectivity to the OrderStatistics service through Windows Azure Service Bus Relay, and verify that the SalesByQuarter and SalesByRegion operations function as expected. This application is available in the ExternalDataAnalyzer project. It is a WCF client that establishes a connection to the service by using the Service Bus APIs of the Windows Azure SDK together with the Service Model APIs of WCF. The connection to the Service Bus is secured by providing the appropriate authentication token. The endpoint definition for connecting to the service and the security credentials are all defined in the app.config file. Like the web application, the ExternalDataAnalyzer project connects to the Service Bus by using a specific identifier, externaldataanalyzer, which has been granted the privileges associated with the Send claim, enabling it to submit requests to the service but preventing it from performing other tasks such as listening for requests from other clients.
<?xml version="1.0" encoding="utf-8"?>
<configuration>
...
<system.serviceModel>
...
<behaviors>
<endpointBehaviors>
<behavior name="SharedSecretBehavior">
<transportClientEndpointBehavior
credentialType="SharedSecret">
<clientCredentials>
<sharedSecret issuerName=
"externaldataanalyzer"
issuerSecret="<data omitted>" />
</clientCredentials>
</transportClientEndpointBehavior>
</behavior>
</endpointBehaviors>
</behaviors>
...
</system.serviceModel>
</configuration>
Figure 5 summarizes the structure and implementation details of the OrderStatistics service and the ExternalDataAnalyzer client application.
Summary
This chapter concentrated on the deployment scenarios related to building applications where some parts run on-premises, some parts run in the cloud, and some parts are implemented by or for external partners. The topics in this chapter concern deployment challenges that Trey Research needed to tackle, such as locating data in the cloud or on-premises, synchronizing data across the different locations that are part of a hybrid solution, and generating business intelligence reports.
Because Trey Research used SQL Server in the original Orders application, deploying data to SQL Azure and using SQL Azure Data Sync to maintain consistency and replicate master data is a simple and natural way to migrate the application to the cloud.
Finally, having chosen to use SQL Azure to store data in the cloud, the SQL Azure Reporting Service is the obvious choice for implementing the business intelligence and reporting solution Trey Research requires, while the Service Bus Relay provides the ideal mechanism for publishing reporting data to partner organizations.
More Information
All links in this book are accessible from the book's online bibliography available at: http://msdn.microsoft.com/en-us/library/hh968447.aspx.
- Windows Azure features tour at http://www.windowsazure.com/en-us/home/tour/overview/.
- "Windows Azure Developer Center" at http://www.windowsazure.com/en-us/develop/overview/.
- "SQL Azure Data Sync Overview" at http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-data-sync-overview.aspx.
- "SQL Azure Data Sync FAQ" at http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-data-sync-faq.aspx.
- "SQL Azure Data Sync- Synchronize Data across On-Premise and Cloud (E2C)" at http://channel9.msdn.com/Series/SQL-Azure-Data-Sync/SQL-Azure-Data-Sync-Synchronize-Data-across-On-Premise-and-Cloud-E2C.
- "SQL Server to SQL Azure Synchronization using Sync Framework 2.1" at http://blogs.msdn.com/b/sync/archive/2010/08/31/sql-server-to-sql-azure-synchronization-using-sync-framework-2-1.aspx.
- "Business Analytics" (SQL Azure Reporting) at http://www.windowsazure.com/en-us/home/tour/business-analytics/.
- "One-Stop Shop for Premium Data and Applications" at http://datamarket.azure.com/.
- "Windows Azure AppFabric: An Introduction to Service Bus Relay" at http://www.microsoft.com/en-us/showcase/details.aspx?uuid=395930db-6622-4a9f-8152-e0cb1fc5149c.