teSAP PRESS :: SAP HANA Certification Guide - 0 Information Modeling Concepts

Key Concepts Refresher

This section looks at some of the core data modeling concepts used when working with SAP HANA and that will be covered on the exam. Let’s start from where the data is stored in SAP HANA by looking at tables.

Tables

Tables allow you to store information in rows and columns . Inside SAP HANA, there are different ways of storing data. We can either store it in a table as row-oriented or column-oriented . In a normal disk-based database, we use row-oriented storage because it works faster with transactions in which we are reading and updating single records. However, because SAP HANA works in memory, we prefer the column-oriented method of storing data in the tables, with which we can make use of compression in memory, remove the storage overhead of indexes, save a lot of space, and optimize for performance by loading compressed data into computer processors.

Once we have our tables, we can begin combining them in SAP HANA information models.

Views

The first step in building information models is building views. A view is a database entity that is not persistent and is defined as the projection of other entities, like tables.

For a view, we take two or more tables, link these different tables together on matching columns, and select certain output fields. Figure 4.1 illustrates this process.

Figure 4.1 Database View

There are four steps when creating database views :

Select the tables
Select two or more tables that have related information—for example, two tables listing the groceries you bought. One table contains the shop where you bought each item, the date, the total amount, and how you paid for your groceries. The other table contains the various grocery items you bought.

Link the tables
Next, link the selected tables on matching columns. In our example, perhaps our two tables both have shopping spree numbers, should ideally a unique number for every shopping trip you took. We call this a key field .
We can link the tables together with joins or unions .

Select the output fields
We want to show a certain number of columns that are of interest to us—for example, to use in our grocery analytics. Normally, you don’t want all the available columns as part of the output.

Save the view
Finally, save your view, and it’s created in the database. These views are sometimes called SQL views . We can even add some filters on our data in the view—for example, to show only the shopping trips in 2015 or all the shopping trips in which we bought apples.

When we call this view, the database gathers the data. The database view pulls the tables out, links them together, chooses the selected output fields, reads the data in the combined fields, and returns the result set. After this, the view “disappears” until we call it again. Then, the database deletes all of the output from the view. It does not store this data from the view inside the database storage.

[+] Tip

It’s important to realize that database views don’t store the result data. Each time you call a view, it performs the full process again.

You might have also heard about materialized views , in which people store the output created by a view in another table. However, in our definition of views, we stated that a view is a database entity that is not persistent; that is the way we use the term view in SAP HANA.

The data stays in the database tables. When we call the view, the database gathers the subset of data, showing us only the data that we asked for. If we ask a database view for that same data a few minutes later, the database will regather and recalculate the data again.

This concept is quite important going forward, because you make extensive use of views in SAP HANA. SAP HANA creates a cube-like view, for example, sends the results back to us, and “disappears” again. SAP HANA performs this process fast, and we avoid consuming a lot of extra memory by not storing static results. What really makes this concept important is the way it enables us to perform real-time reporting .

In the few minutes between running the same view twice, our data in the table might have changed. If we use the same output from the view every time, we will not get the benefit of seeing the effect of the updated data. Extensive caching of result sets does not help when we want real-time reporting.

[»] Note

Real-time reporting requires recalculating results, because the data can keep changing. Views are designed to handle this requirement elegantly.

We have discussed the basic type of database views here, but SAP HANA takes the concept of views to an entirely new level!

Before we get there, we’ll look at a few more concepts that we will need for our information modeling journey.

Cardinality

When we join tables together, we need to define how much data we expect to get from the various joined tables. This part of the relationship is called the cardinality .

There are four basic cardinalities that you will typically work with. The cardinality is expressed in terms of how many records on the left side join to how many records on the right side.

One-to-one
Each record on the left side matches with one record on the right side. For example, say that you have a lookup table of country codes and country names. In your left table, you have a country code called ZA. In the right table (the lookup table), this matches with exactly one record, showing that the country name of South Africa is associated with ZA.

One-to-many
In this case, each record on the left side matches with many records on the right side. For example, say you have a list of publishers in the left table. For one publisher—such as SAP PRESS—we find many published books.

Many-to-one
This is the inverse of one-to-many. For example, this case may apply when we have many people in a small village all buying items at the local corner shop.

Many-to-many
For example, this case may apply when many people read many web pages.

Joins

Before we look at the different types of joins, let’s quickly discuss the idea of “left” and “right” tables. Sometimes, it does not make much of difference which table is on the left of the join and which table is on the right, because some join types are symmetrical. However, with a few join types it does make a difference.

We recommend putting the most important table, seen as the main table , on the left of the join. An easy way to remember which table should be the table on the right side of the join is to determine which table can be used for a lookup or for a dropdown list in an application (see Figure 4.2).

Figure 4.2 Table Used for Dropdown List on Right-Hand Side of a Join

This does not always mean that this table has to be physically positioned on the left of the screen in our graphical modeling tools. When we create a join in SAP HANA, the table we start the join from (by dragging and dropping) becomes the “left table” of the join.

Databases are highly optimized to work with joins; they are much better at it than you can hope to be in your application server or reporting tool. SAP HANA takes this to the next level with its new in-memory and parallelization techniques.

In SAP HANA, there are a number of different types of joins . Some of these are the same as what you would find in traditional databases, but others are unique to SAP HANA. Let’s start by looking at the basic join types.

Basic Join Types

The first basic join types that you will find in most databases are inner joins , left outer joins , right outer joins, and full outer joins . We will discuss SQL in more detail in Chapter 8.

The easiest way to visualize these join types is to use circles, as shown in Figure 4.3. This is a simplified illustration of what these join types do. (The assumption is that the tables illustrated here contain unique rows; that is, we join the tables via primary and foreign keys, as illustrated in Figure 4.1. If the tables contain duplicate records, this visualization does not hold.)

Figure 4.3 Basic Join Types and Their Result Sets

The left circle represents the table on the left side of the join. In Figure 4.3, the left table contains the two values A and B. The right circle represents the table on the right side of the join and contains the values B and C.

Inner Join

The inner join is the most widely used join type in most databases. When in doubt and working with a non-SAP HANA database, try an inner join first.

The inner join returns the results from the area where the two circles overlap. In effect, this means that a value is shown only if it is found to be present in both the left and the right tables. In our case, in Figure 4.3, you can see that the value B is found in both tables.

Because we are only working with the “overlap” area, it does not matter for this join type which table is on the left or on the right side of the join.

Left Outer Join

When using an inner join you may find that you’re not getting all the data back that you require. To retrieve the data that was left out, you would use a left outer join. You will only use this join type when this need arises.

With a left outer join, everything in the table on the left side is shown (first table). If there are matching values in the right table, these are shown. If there are any “missing” values in the right hand table, these are shown with a NULL value.

For this join type, it is important to know which tables are on the left and the right side of the join.

Right Outer Join

The inverse of the left outer join is the right outer join, This shows everything from the right table (second table). If there are matching values in the left table, these are shown. If there are any “missing” values in the left hand table, these are shown with a NULL value.

Full Outer Join

As of SAP HANA SPS 11, we now have the full outer join. Many other databases also have this join type, so it is still regarded as one of the four basic join types.

This join type combines the results sets of both the left outer join and right outer join into a single result set.

Self-Join s

There isn’t really a join type called a self-join; the term refers to special cases in which a table is joined to itself. The same table is both the left table and the right table in the join relationship. The actual join type would still be something like an inner join.

This configuration is used mostly in recursive tables —that is, tables that refer back to themselves, such as in HR organizational structures. All employee team members have a manager, but managers are also employees of their companies and have someone else as their manager. In this case, team members and managers are all employees of the same company and are stored in the same table. Other examples include cost and profit center hierarchies or bills of materials (BOMs).

We return to this concept when we look at the Hierarchies section later in this chapter.

SAP HANA Join Types

The join types on the right side of Figure 4.4 are unique to SAP HANA: referential join, text join , and temporal join .

Figure 4.4 Basic Join Types and Some SAP HANA-Specific Join Types

Referential Join

The referential join type normally returns exactly the same results as an inner join . So, what’s the difference? There are many areas SAP HANA tries to optimize the performance of queries and result sets. In this case, even though a referential join gives the same results as an inner join , it provides a performance optimization under certain circumstances.

The referential join uses referential integrity between the tables in the join. Referential integrity is used in a business system to ensure that data in the left and right tables match. For example, we can’t have an invoice that isn’t linked to a customer, and the customer must be inserted into the database before we are allowed to create an invoice for that customer. Note that you do not need a value on both sides of the join: You may have a customer without an invoice, but you may not have an invoice without a customer.

If we can prove that the data in our tables has referential integrity, which most data in financial business systems has, then we can use a referential join. In this case, if we’re not reading any data from the right table, then SAP HANA can quite happily ignore the entire right table and the join itself, and it doesn't have to do any checking, which speeds up the whole process.

[+] Tip

A referential join is the optimal join type in SAP HANA.

Text Join

Another SAP HANA-specific joint type is a text join. The name gives a clue as to when we will use this type of join.

The right-hand table, as shown in Figure 4.5, would be something like a text lookup table—for example, a list of country names. On one side, we would have a country code such as US or DE, and in the text lookup table we would have the same country code and would link it with the key, and also would have the actual name of the country (e.g., United States or Germany).

Figure 4.5 Text Join

SAP sells software in many countries, and SAP business systems are available in multiple languages, and the name of the country and be translated into different languages. The fourth column in the text lookup table, called the language code, indicates into which language the country name has been translated. For example, for the country called DE and a language code of EN, the name of the country is in English and thus would be Germany. If the language code was DE (for German), the name of the country would be Deutschland, and if the language code was ES (for Español, indicating Spanish) then the name of the country would be Alemania.

Therefore, the same country can have totally different names depending on what language you speak.

In such a case, SAP HANA does something clever: By looking at the browser, the application server, or the machine that you are working on, it determines the language that you are logged on in. If you are logged in using English, it knows to use the name Germany. If you are logged on using German, it provides the German country name Deutschland for you.

The text join behaves like a left outer join but always has a cardinality of one-to-one (1:1). In other words, it only returns a single country name based on the language you’re logged on with. Even if you’re logged in via mobile phone, your mobile phone has a certain language setting.

Temporal Join

We use the temporal join when we’ve got FROM and TO date and time fields, or integers.

For example, say that you’re storing the fuel price for gasoline in a specific table. From the beginning of this month to the end of this month, gasoline sells for a certain price. Next month, the price differs, and maybe two weeks later, it’s adjusted again. Later, you’ll have a list of all the different gasoline prices over time.

As a car owner, you now want to perform calculations for how much you’ve paid for your gasoline each time you filled up your tank. However, you just have the number of gallons and the dates of when you filled up.

You can say, “OK, on this date, I filled up the tank.” You then have to look up which date range your filling-up date falls within and find the price for that specific date. You can then calculate what you paid to fill your tank for each of your dates. Figure 4.6 illustrates this example.

Figure 4.6 Temporal Join for Gasoline Prices

This date range lookup can be a little more complicated in a database. Sometimes, programmers read the data into an application server and then loop through the different records. In this case, a temporal join makes it easy, because SAP HANA will perform the date lookups in the FROM and the TO fields automatically for you, compare it to the date you’ve supplied, and get you the right fuel price at that specific date. It will perform all the calculations automatically for you—a great time and effort saver.

A temporal join uses either a referential join ; or an inner join to do the work. A temporal join requires valid to and from dates (in the gasoline price table) and a valid date-time column (in your car log book table).

Spatial Join

Spatial joins became available in SAP HANA SPS 09. SAP HANA provides spatial data types, which we can use, for example, with maps. These all have the prefix ST_. A location on a map, with longitude and latitude, would be an ST_POINT data type. (We will discuss spatial data and analytics further in Chapter 9.)

We can use special spatial joins between tables in SAP HANA (see Figure 4.7). Say that you have a map location stored in a ST_POINT data type in one table. In the other table, you have a list of suburbs described in a ST_POLYGON data type. You can now join these two tables with a spatial join and define the spatial join to use the ST_CONTAINS method. This will calculate for each of the locations (ST_POINT) in which suburb (ST_POLYGON) they are located (ST_CONTAINS).

Figure 4.7 Spatial Join

There are about a dozen methods like ST_CONTAINS available for spatial joins. For example, you can test if a road crosses a river, if one area overlaps another (e.g., mobile phone tower reception areas), or how close certain objects are to each other.

We have only mentioned the two-dimensional aspects in relation to maps, but many of these spatial functions can be used in three dimensions.

Dynamic Joins

A dynamic join is not really a join type; it’s a join property. It is also a performance enhancement available for SAP HANA. Once you have defined a join, you can mark it as a dynamic join. For this to work, you have to join the tables on multiple columns.

Let’s assume you define a static (normal) join on columns A, B, and C between two tables, as shown in Figure 4.8. In this case, the join criteria on all three columns will be evaluated every time the view is called in a query.

Figure 4.8 Dynamic Join on Multiple Columns

If you now mark the join as dynamic, the join criteria will only be evaluated on the columns involved in the query to the view. If you only ask for column A in your query of the view, then only column A’s join criteria will be evaluated. The join criteria on columns B and C will not be evaluated, and your system performance improves.

[+] Tip

If your query does not request any of the join columns in the dynamic join, it will produce a runtime error.

Core Data Services Views

CDS is a modeling concept that is becoming more important with every new release of SAP HANA. To understand why we need CDS, it is important to remember the wider context of how SAP HANA systems are deployed.

While learning SAP HANA, you work with a single system, but productive environments normally include development, quality assurance, and production systems. You perform development in the development environment, in which you have special modeling and development privileges. Your models and code then enter the quality assurance system for testing. When everyone is satisfied with the results, the models and code move to the production system. You do not have any modeling and development privileges in the quality assurance and production systems; you need a way to create views and other database objects in the production system, ideally without giving people special privileges.

An SAP HANA system can be deployed as the database for an SAP business system. In such a case, ABAP developers do not have any modeling and development privileges, even in the development system. We need a way to allow ABAP developers to create views and other database objects in the production system without these special privileges.

With that in mind, let’s look at why CDS is a good idea with the following example:

You need to create a new schema in your SAP HANA database. It’s easy to use a short SQL statement to do this, but remember the wider context. Someone will need the same create privileges in the production system! We want to avoid this, because giving anyone too much authorization in a production system can weaken security and encourage hackers.

CDS allows for a better way to create such a schema: Specify the schema creation statement once by creating a CDS (text) file with the schema_name="DEMO"; instruction. When we take this CDS file to the production system, the SAP HANA system automatically creates the schema for us. In the background, SAP HANA automatically converts the CDS file into the equivalent schema creation SQL statement and executes this statement.

The system administrators do not get privileges to create any schemas in the production system. They have rights to import CDS files, but they have no control over the contents of the CDS files; that’s decided by the business. CDS thus gives us a nice way to separate what the business needs from what database administration can do.

There’s much more to CDS. You can specify table structures (metadata) and associations (relationships) between tables with CDS, and taking this CDS file to production creates a new table. Even more impressive, if you want to modify the table structure later, SAP HANA does not delete the existing tables and recreate them; it automatically generates and executes the correct SQL statements to modify only the minimal changes to the existing table structures—for example, adding only a new column. CDS does not replace SQL statements, because it ultimately translates back into SQL statements. CDS adds the awareness to use a table creation SQL statement the first time it’s run but a table modification SQL statement the second time it’s run.

We can also create views using CDS. These CDS views can be used as data sources in our SAP HANA information views. Please note that these CDS views do not replace the more powerful information views we can create in SAP HANA.

ABAP developers can also use CDS inside ABAP. When a CDS view is sent to the SAP HANA system, it creates the necessary SAP HANA database objects without the ABAP developer having to log into the SAP HANA database. Because ABAP is database-independent, the same CDS file sent to another database will create the database objects relevant to that database. Because of this database independence, the versions of CDS for ABAP and for SAP HANA have slight differences.

The example screens in this book make use of the SHINE demo package described in Chapter 2. SHINE is a good example of CDS in action. You do not have to create a schema, create tables, or import data into these tables. When SHINE is imported into your SAP HANA system, it automatically creates all that content for you by using CDS statements.

Now, let’s take our knowledge of these concepts to the next “dimension.”

Cube

One of the most important modeling concepts we will cover is the cube . We’ll expand on this concept when we discuss the information views available in SAP HANA. In this section, we’ll use a very simplistic approach to describing cubes to give you an understanding of the important concepts.

When you look at a cube the first time, it looks like multiple tables that are joined together—and at a database level, that might be true. What makes a cube a cube are the “rules” for joining these tables together.

The tables you join together contain two types of data: transactional data and master data. As shown in Figure 4.9, we have a main table in the middle called a fact table. This is where our transactional data is stored. The transactional data stored in the fact table is referred to as either key figures or measures . In SAP HANA, we just use the term measures.

Figure 4.9 Cube Features

The side tables are called dimension tables , and this is where our master data is stored. The master data that is stored in the dimension table are referred to as characteristics or attributes , or sometimes even facets . In SAP HANA, we just use the term attributes.

To clear up these statements, let’s walk through an example. Our cube example will be simplified to communicate the most basic principles.

In our example, we’ll work with the sentence “Laura buys 10 apples.” You will agree that this sentence describes a financial transaction: Laura went to the shop to buy some apples, she paid for them, and the shop owner made some profit.

Where will we will store this transaction’s data in the cube? There are a couple of “rules” that we have to follow. The simplest rule is that you store the data with which you can perform calculations on in the fact table, and you store everything else that you can’t perform calculations on in the dimension tables.

In our transaction of “Laura buys 10 apples,” we first look at Laura. Can we perform a calculation on Laura? No, we can’t. Therefore, we should put her name into one of the dimension tables. We put her name in the top dimension table in Figure 4.10. Laura is not the only customer of that shop; her name is stored with those of many other people in this dimension table. We will call this table the Customer dimension table.

Figure 4.10 Financial Transaction Data Stored in a Cube

Next, let’s look at the apples. Can we perform calculations on apples? No, we can’t, so again this data goes into a dimension table. The shop doesn’t sell only apples; it also sells a lots of other products. The second dimension table is therefore called the Product dimension table.

In an enterprise data warehouse, we would normally limit the number of dimension tables that we link to the fact table for performance reasons. Typically, in an SAP BW system, we limit the number of dimension tables to a maximum of 16 or 18.

Finally, let’s look at the number of apples, 10. Can we perform a calculation on that? Yes, because it’s a number. We therefore store that number in the fact table .

To complete the picture, if Laura buys more apples, then she is a top customer; let’s say she’s customer number 2. Apples are something the store keeps in stock, so apples are product number 7.

If we store a link (join) to customer 2 and another link (join) to product 7 and the number 10 (number of apples), we would say that that constitutes a complete transaction. We can abbreviate the transaction “Laura buys ten apples” to 2, 10, and 7 in our specific cube.

Laura and apples are part of our master data. The transaction itself is stored in the fact table. We’ve now put a real-life transaction into the cube.

Hopefully, this gives you a better idea of what a cube is, and how to use it. Let’s complete this short tour of cubes by clarifying a few terms ahead.

Attributes and Measures

As previously shown in Figure 4.9, attributes are stored in dimension tables, and measures are stored in fact tables. Let’s define attributes and measures:

Attributes
Attributes describe a transaction, like the customer Laura or the product apples. This is also referred to as master data. We cannot perform calculations on attributes.

Measures
Measures are something we can measure and perform calculations on. Measurable (transactional) information goes into the fact table.

Fact Tables in the Data Foundation

Figure 4.10 showed only a single fact table, but sometimes we have more complex cubes that include multiple fact tables. In SAP HANA, we call a group of fact tables (or the single fact table) the data foundation.

When we join the data foundation (fact tables) to the dimension tables, we always put the data foundation on the left side of the join.

Star Joins

The join between the data foundation and the dimension tables is normally a referential join in SAP HANA. We refer to this as a star join, which indicates that this is not just a normal referential join between low-level tables, but a join between a data foundation and dimension tables. This is seen as a join at a higher level.

Originally, we called these logical joins in SAP HANA, however they were renamed star joins as of SAP HANA SPS 10.

With all the building blocks in place, let’s start examining SAP HANA information views.

Information Views

Let’s start our introduction to SAP HANA information views by looking back at what we learned when we discussed cubes.

Dimension Views

The first thing we need to build cubes is the master data. Master data is stored in dimension tables. Sometimes, these dimension tables are created for the cubes. In SAP HANA, we do not create dimension tables as separate tables; instead we build them as views.

We join all the low-level database tables that contain the specific master data we are interested in into a view. For example, we might join two tables in a view to build a new Products dimension in SAP HANA, and maybe another three tables to build a new Customers dimension.

In this book, we will refer to these type of views simply as dimension views . In SAP HANA, there are currently two types of dimension views available:

Attribute views
The older version of dimension views are called attribute views. This name refers to the data, called attributes, stored in dimension tables. Attribute views are still used in exceptional cases in SAP HANA SPS 11.

Calculation views of type dimension (dimension calculation views)
As of SAP HANA SPS 10, we now use calculation views of type dimension , also called dimension calculation views. In SAP HANA SPS 11, a new migration tool is available to help migrate your old attribute views to the new dimension calculation views.

Just as the same master data is used in many cubes, so will we reuse dimension views in many of the other SAP HANA information views. You can even build your own “library” of dimension views.

Star Join Views

In the same way, instead of storing data in cubes, we can create another type of view in SAP HANA that produces the same results as a traditional cube. There are currently two types of SAP HANA views that produce the same results as an old-fashioned, traditional cube:

Analytic views
We still have an older version of these SAP HANA views called analytic views . This name refers to the fact that we use cubes for analytics. Analytic views are still used in some cases in SAP HANA SPS 11.

Calculation views of type cube with star join
As of SAP HANA SPS 10, we now use calculation views of type cube with star join . In SAP HANA SPS 11, a new migration tool is available to help you to migrate your old analytic views to the new calculation views of type cube with star join.

[»] Terminology

In this book, we will simply refer to these types of views as star join views, not as cube views, which would lead to confusion. This is because the third type of SAP HANA views is called calculation view of type cube.

The results created by the star join views in SAP HANA are the same as that of a cube in an enterprise data warehouse, except that in a data warehouse, the data is stored in a cube. With SAP HANA, the data is not stored in a cube, but is calculated when required.

Calculation Views of Type Cube

This view type, provides even more powerful capabilities. For example, you can combine multiple star join views (“cubes”) to produce a combined result set. In SAP Business Warehouse (BW), we call this combined result set a MultiProvider.

In SAP HANA, we call these calculation views of type cube . Use cases for this type of view include:

In your holding company, you want to combine the financial data from multiple subsidiary companies.
You want to compare your actual expenses for 2015 with your planned budget for 2015.
You have archived data in an archive database that you want to combine with current data in SAP HANA’s memory to compare sales periods over a number of years.

Using Information Views

Let’s examine how to use the different types of SAP HANA information views together in information modeling. Figure 4.11 provides an overview of everything you’ve learned in this chapter and how each topic fits together.

Figure 4.11 Different SAP HANA Information Views Working Together

On the top left-hand side, you can see our source systems: an SAP source system and a non-SAP source system. SAP HANA doesn’t care from which system the data comes. In many projects, we have more non-SAP systems that we get data from than SAP systems. We extract the data from these systems using various data provisioning tools, which we’ll look at in Chapter 14.

The data is stored in row tables in these source systems. When we use our data provisioning methods and tools, the data is put into column tables in the SAP HANA database. This is illustrated in the bottom-left corner of Figure 4.11.

Building the Information Model

Now, let’s start building our information models on top of these tables.

If we want to build something like a cube for our analytics or reporting, we use SAP HANA information views. First, we need the equivalent of a dimension table , such as our Product or Customer dimension table. How would we build this?

We will build a dimension view (a calculation view of type dimension, or an attribute view). This is indicated in Figure 4.11 by Laura and Apples, which represent the Customer and Product dimension views. We create these dimension views the same way we create any other view: Take two tables, join them together, select certain output fields, and save it.

Once we have our dimension views, we build a data foundation for the cube. The data foundation is for all our transactional data. The data foundation stores measures, like the number 10 for the 10 apples that Laura bought. We build a data foundation in a similar fashion to building a dimension view: We take multiple tables or a single table. If using multiple tables, we join them together, select the output fields, and build our data foundation.

To complete the cube, we have to link our data foundation (fact tables ) to the dimensions with a star join. Finally, to complete our star join view , we also select the output fields for our cube. The created star join view will produce the same result as a cube, but in this case no data is stored and it performs much better.

In the next step, we can create calculation view of type cube , in which we combine two star join views (cubes). We can combine the star join views using a join or a union. We recommend a union in this case, because it leads to better performance than a join.

Finally, we can build reports, analytics, or applications on our calculation view. This is shown in the top right of Figure 4.11.

Using the Information Model

In the previous section, we looked at how to build an information model from the bottom up. Let’s now think about the process from the other side and look at what happens when we call our report or application that built on the calculation view . This is the top-down view of Figure 4.11.

The calculation view does not store the data: It has to gather and calculate all the data required by the report. SAP HANA looks at the calculation view’s description and says, “Oh, I’ve got to build two star join views and union them together. How do I build these star join views?”

SAP HANA then goes down one level and says, “OK, how do I build the first star join view? I need a data foundation, and I need two dimension views, and I join them together with a star join.”

Then, it goes down another level: “How do I build the dimension view? I’ll read two different tables, join them together, and select the output fields.” SAP HANA then builds the two dimension views.

Going back up one level, SAP HANA needs to combine these two dimension tables with a data foundation. It creates the data foundation and joins it with the two dimension tables. It also builds the second star join view in the same way. Going up to the level of the calculation view, it combines the two star join views with a union. Finally, SAP HANA sends the result set out to the report, and then cleans out the memory again. (It does not store this data!) If someone else asks for the exact same data five minutes later, SAP HANA happily performs the same calculations all over again.

At first, this can seem very wasteful. Often we’re asked “Why don’t you use some kind of caching?” However, caching means that we store data in memory. In SAP HANA the entire database is already in memory. We don’t need caching because everything is in memory already.

The reason people ask this question is that in their traditional data warehouses, data normally doesn’t change until the next evening when the data is refreshed. Inside SAP HANA, the data can be updated continuously, and we always want to use the latest data in our applications, reports, and analytics. That means we have to recalculate the values each time; caching static result sets will not work.

In summary, to use real-time data , you have to continuously recalculate.

Other Modeling Artifacts

We will now discuss some of the other modeling concepts we will use when building our information views.

The power of calculation views of type cube is that you can build them up in many layers. In the bottom layer, you can perhaps have a union of two star join views (cubes), as shown in Figure 4.12. In the next (higher) layer, we sum up all the values from the union. In the level above that, we rank the summed results to show the top 10 most profitable customers.

Figure 4.12 Calculation View of Type Cube Built in Various Layers

Projection

A projection is used to change an output result set. For example, we can get a result set from a star join view (cube) and find that it has too many output fields for what we need in the next layers. Therefore, we use a projection as an additional in-between layer to create a subset of the results.

We can also use the projection to calculate additional fields—for example, sales tax—or we can rename fields to give them more meaningful names for our analytics and reporting users.

Ranking

As illustrated in the top node of Figure 4.12, we can use ranking to create useful top-N or bottom-N reports and analytics. This normally returns a filtered list (e.g., only the top 10 companies) and is sorted in the order we specify.

Aggregation

The word aggregation means a collection of things. In SQL, we use aggregations to calculate values such as the sum of a list of values. Aggregations in databases provide the following functions:

Sum
Adds up all the values in the list

Count
Counts how many values (records) there are in the list

Minimum
Finds the smallest value in the list

Maximum
Finds the largest value in the list

Average
Calculates the arithmetical mean of all the values in the list

Standard deviation
Calculates the standard deviation of all the values in the list

Variance
Calculates the variance of all the values in the list

Normally, we would only calculate aggregations of measures. (Remember that measures are values we can perform calculations on, which usually means our transactional data.) For example, we could calculate the total number of apples a shop sold in 2015. However, in the latest versions of SAP HANA, we can also create aggregations of attributes, essentially creating a list of unique values.

Union

A union combines two result sets. Figure 4.12 illustrates this by combining the results of two star join views.

In SAP HANA graphical calculation views, we’re not restricted to combining two result sets; we can combine multiple result sets. SAP HANA merges these multiple result sets into a single result set.

Let’s say we have four result sets—A, B, C, and D—that we union together. Although we union multiple result sets, that doesn’t mean the performance will be bad. If SAP HANA detects that you are not asking for data from A and D, it will not even generate the result sets for A and D; it will only work with B and C.

[»] Note

The union expects the two result sets to have the same number of columns, in the same order, and the matching columns from each result set must have similar data types.

You can have variations in unions, such as the following:

Union all
Will merge the result sets and return all records

Union
Will merge the results and only give you the unique results back

Union with constant values
Lets you “pivot” your merged results to help get the data ready for reporting output

Figure 4.13 compares the output generated by a union to that of a union with constant values.

Figure 4.13 Standard Union Compared to Union with Constant Values

Often, we want a report that looks like the top right of the figure. If we’re given the output from the standard union, it takes more work to create a report with sales for 2015 and 2016 in separate columns. With the output from the union with constant values, writing the report is easy.

In Figure 4.11, we showed that you should use a union rather than a join for performance reasons—but what happens if the two star join views that you want to union together do not have the same outputs? In that case, the union will return an error. How do we solve this problem?

Figure 4.14 illustrates an example. On top, we have the ideal case in which both star join views have matching columns. In the middle, we have a case in which only column B matches. Normally, we would use a join, as shown.

Figure 4.14 Standard Union, Join, and Join Converted to Union

We solve this problem by creating a projection for each of the two star join views. We add the missing columns and assign a NULL value to those columns. We then perform the union on the two projections, because they now have matching columns.

Semantics

An important step that many modelers skip is to add more meaning to the output of their information models. Semantics, as a general term, is the study of meaning. We can use semantics in SAP HANA to, for example, rename fields in a system so they are more clear. Some database fields have really horrible names. Using semantics, you can change the name of these fields to be more meaningful for your end users. This is especially useful when they create reports. If you have a reporting background, you will know that a reporting universe can fulfill a similar function.

We can also show that certain fields are related to each other—for example, a customer number and a customer name field.

One example that is used often is that of building a hierarchy. By using a hierarchy, you can see how fields are related to each other with regards to “drilling down” for reports.

Hierarchie s

Hierarchies are used in reporting to enable intelligent drilldown. Figure 4.15 illustrates two examples: Users can start with a list of sales for all the different countries; they can then drill down to the sales of the provinces (or states) of a specific country, and finally down to the sales of a city.

End users can see the sales figures for just the country, state, or province or for a specific city. Even though countries and cities are two different fields, they’re related to each other by use of a hierarchy. This extra relationship enhances the data and gives it more meaning (semantic value).

Figure 4.15 Examples of Level Hierarchies

There are two different types of hierarchies: level hierarchies and parent–child hierarchies . The hierarchy we just described is the level hierarchy; level 1 is the country, level 2 is the state or province, and level 3 is the city or town.

Another example of a level hierarchy is a time-based hierarchy that goes from a year to a month to a day. You can even keep going deeper, down to seconds. This is illustrated on the right side of Figure 4.15.

HR employee organizational charts or a cost center structures are examples of parent–child hierarchy, as illustrated in Figure 4.16. In this case, we join a table to itself, as described in the Self-Joins section.

Figure 4.16 Parent–Child Hierarchy Example

Table 4.1 provides a quick summary of the differences between the two types of hierarchies and when you should use which one.

Level Hierarchies	Parent–Child Hierarchies
Fixed (rigid) number of levels in the hierarchy.	Variable hierarchy depth.
Different data fields on every level of the hierarchy (e.g., country, state, city).	The same (two) fields used on every level of the hierarchy.
The fields on each level of the hierarchy can be different data types (e.g., country can be text, while zip/postal code can be numeric).	The parent and child fields have the same data type.

Table 4.1 Comparing Level Hierarchies and Parent–Child Hierarchies

Best Practices and Modeling Guidelines

Let’s end our tour of modeling concepts with a summary of best practices for modeling views in SAP HANA. Many of the basic modeling guidelines that we find in traditional data modeling and reporting environments are still applicable in SAP HANA. We can summarize the basic principles as follows:

Limit (filter) the data as early as possible
It’s wasteful to send millions of data records across a slow network to a reporting tool when the report only uses a few of these data records. It makes a lot more sense to send only the data records that the report requires, so we want to filter the data before it hits the network.
In the same way, we don’t want to send data to a star join view when we can filter it in the earlier dimension view. Hence, the rule to filter as early as possible.

Perform calculations as late as possible
Assume you have column A and column B in a table. The table contains billions of records. We can use the values of column A and column B in a formula for every data record, which means the server performs billions of calculations, and finally add up the sum total of the calculated fields.
It would be much faster to add up the sum total for column A and the sum total for column B and then calculate the formula. In that case, we perform the calculation once instead of billions of times. We delay the calculation as long as possible for performance reasons.