Key Concepts Refresher
In the preceding chapters, we’ve discussed many performance enhancements that are built into the design of SAP HANA; in this chapter, we’ll walk through various optimization topics in SAP HANA.
Architecture and Performance
The largest and most fundamental change that comes with SAP HANA technology is the move from disks to memory. The in-memory technology that SAP HANA is built on has been one of the biggest contributors of performance improvements. The decision to break with the legacy of disk-based systems and use in-memory systems as the central focus point (and everything that comes with this) has led to significant changes in performance.
The next step was to combine memory and multicore CPUs. If we store everything in memory and have plenty of processing power, we do not have to store calculated results. We just recalculate them when we need them—which opens the door to real-time computing. Real-time computing requires that results be recalculated each time they’re needed, because the data might have changed since the last query.
This combination of memory and CPU processing power led to new innovations. Now, you can combine OLTP and OLAP together in the same system: The OLAP components don’t need the storage of precalculated cubes and can be recalculated without impacting the performance of the OLTP system.
Another innovations prompted by combining memory and CPU is data compression. This both reduces memory requirements and helps to better utilize CPUs by loading more data in the level 1 CPU cache. Loading more data into CPUs once again improved performance.
To improve compression, SAP HANA prefers columnar storage. Along with many advantages, columnar storage also has a few disadvantages: Inserting new records and updating existing records are not handled as efficiently in column tables as in row tables. To solve this, SAP HANA implemented the insert-only principle with delta buffers. We use row storage for delta buffers to improve insert speeds and then use delta merges to bring the data back to columnar storage, which leads to better read performance.
[+] Tip
The performance improvements in SAP HANA do not mean that we can be negligent in modeling.
For example, you should not use SELECT *
statements or choose any unnecessary fields in queries, because doing so decreases
the efficiency of the column storage.
Using multicore processors, everything in SAP HANA was designed from the ground up to run in parallel. Even a single aggregation value is calculated using parallelism. This led to SAP HANA growing from a single hardware appliance to a distributed scale-out system that can work across multiple server nodes and finally to multitenant database containers. Tables can now also be partitioned across multiple servers. With more servers, the scale and performance of SAP HANA systems has vastly advanced.
Redesigned and Optimized Applications
While combining memory and processors, we also saw that moving data from memory to application servers is not the optimal way to work with an in-memory database. It makes more sense to push down some of the application logic to the SAP HANA system for fast access to the data in memory and plenty of computing resources for calculations.
SAP started by making SAP HANA available as an accelerator and sidecar solution to improve issues with long-running processes in current systems. Over time, SAP business applications were migrated to SAP HANA and re-engineered to make use of the advantages it offers.
With SAP S/4HANA, we now have a system that uses views instead of materialized aggregates in lookup tables, significantly reducing the size of the system, simplifying the code, and enriching the user experience.
Information Modeling Techniques
In previous chapters, we discussed some techniques that can help optimize information models, such as the following:
- Limit data as quickly as possible. Do this by defining filters on the source data, applying variables in your SAP HANA views, and selecting only the fields that you require in your query results. Also, let SAP HANA perform the calculations and send only the results to a business application, instead of sending lots of data to the application and performing the calculations there.
- Perform calculations as late as possible. For example, perform aggregation before calculation in analytic views.
- Use referential joins when your data has referential integrity. When working with referential integrity, you do not have to evaluate the join for all queries.
- Similarly, use dynamic joins to optimize the join performance for certain queries.
- Join tables on key fields.
- Use a union instead of a join for combining large sets of data; make unions even more efficient by using union pruning.
- Only use a single fact table in a data foundation of a star join view.
- Build up your information views in layers.
If you use these techniques, you will avoid many optimization hazards. These techniques should be part of your everyday workflow.
[+] Tip
The best optimization is no optimization. If you avoid the pitfalls in this chapter, you will not need to optimize later.
In spite of your best efforts, sometimes you need to optimize an information model.
Optimization Tools
When optimizing an SAP HANA information model, a number of tools are at your disposal. You need to know when to pick which optimization tool and how to use it.
SAP HANA Engines
SAP HANA has several internal components called engines. Each of these engines is optimized to provide specific functionality for the processing of SAP HANA information models:
- Row engine
The row engine is used for processing data from row tables and certain features that are traditionally expected from relational databases. - Column engine
This engine processes data from the column table and associated features. - Join engine
Dimension views almost exclusively use joins between tables and therefore send most of their processing to the join engine. - OLAP engine
Star join views are OLAP “cubes.” The OLAP engine specializes in handling cube-type processes and is traditionally associated with analytic views. - Calculation engine
Calculation views and calculated columns use the calculation engine. The spatial engine is added functionality in the calculation engine that provides the spatial capabilities of SAP HANA. - SQL engine
This engine is used for processing the SAP HANA Live models, which are mostly focused on relational processes.
These engines work together to process information models. However, transferring information from one engine to another can affect performance.
[+] Tip
Reduce the amount of data transferred between the engines.
A better idea of how this process between engine works can be seen when using a star join view with calculated columns. In this case, SAP HANA might use the OLAP engine for processing the star join view and the calculation engine for processing the calculated columns. If you plan to use a calculation view of type cube on top of the star join view later, it might make sense to move the calculated columns to the calculation view of type cube. This is in line with our general recommendation to calculate as late as possible.
[»] Note
As SAP HANA improves and matures, there will be less emphasis on the engines and the transfer between engines. With SAP HANA now providing a migration tool to help move to calculation views, this implies less emphasis on knowing about and compensating for the transfer between engines.
Now that you have a better understanding of SAP HANA’s engines, in the next section we’ll begin discussing the tools available for optimizing performance.
Explain Plan
One of the purposes of the Explain Plan tool is to show which engines are used when processing an information model and if there is a transfer between engines involved. It also shows what operators are processed—for example, a join, a column table, a column search, or a filter. This tool does not provide any information about the actual processing, such as how many records were processed or how long it took.
You can access the Explain Plan tool from the context menu of the SQL Console.
[+] Tip
There is a quick way to reach any SAP HANA view from the SQL Console window: Right-click the name of the view in the Content area and select the Generate Select SQL option from the context menu. This will open a SQL editor with an equivalent SQL query for the selected information view.
Figure 10.1 shows the Purchase Overview calculation view, opened from the SHINE content. From the context menu in the SQL Console, choose the Explain Plan option to launch the tool for this SQL query.
Figure 10.1 Open Explain Plan from Context Menu
Figure 10.2 shows the Explain Plan with the SQL query on the top and the execution engines and the names of the operators used when running the SQL query. In the Operator Details column, you can see which tables are used here.
Visualize Plan
From the same context menu in the SQL Console, you can also launch the Visualize Plan tool. The Visualize Plan tool has two options, as shown in Figure 10.3: Prepare and Execute.
Figure 10.3 Visualize Plan Context Menu Options in SQL Console
The Visualize Plan tool visually shows how the SQL query will be prepared and executed. Figure 10.4 illustrates the prepared execution plan. The bottom row shows the various column tables that will be read, and the lines going to the next block up (a join node) show how many records will be read from each table. The join node output again shows the number of rows.
Figure 10.4 Prepared Execution Plan Visualized
This information is helpful when you want to reduce the data transferred between layers as quickly as possible or filter as early as possible, as recommended. You can view the prepared plan and find out how many records will be generated in the result set. Then, you can apply a filter, for example, and rerun the prepared plan to see the difference.
Once you’ve viewed the prepared execution plan, you can execute the SAP HANA information model and call the Visualize Plan tool for the executed plan. You will see the Overview tab (Figure 10.5).
Figure 10.5 Executed Plan Overview Tab
From the Overview tab, you can see how long the query took, how much memory it used, the number of tables that were used, and how many records were returned in the result set.
Next to the Overview tab is the Executed Plan tab, which shows the executed plan visually, with the execution times for each operator block. You can expand an operator block by clicking on the small triangle at the top-right of the block (see Figure 10.6). Here, you can see the execution time for every block and how many records are passed between the operations.
Figure 10.6 Executed Plan Visualization and Timeline View
Figure 10.6 also shows the timeline view, which presents timelines for various operations. The top and bottom portions of the screen are linked, so if you select a line on the timeline view, the corresponding operations block in the top graph is selected as well.
Next to the Timeline tab, you’ll see the Operator list tab (Figure 10.7), which provides more detail on every operator used when executing the query.
Figure 10.7 Executed Plan Visualization and Operator List
As you can see, the Visualize Plan is a powerful tool that can help you optimize SAP HANA information models.
Administration Console
We briefly mentioned the Administration Console in Chapter 5. There are three tabs that are especially relevant to optimization:
- Performance
The Performance tab has several subtabs. The SQL Plan Cache subtab (shown in Figure 10.8) contains a list of all the SQL statements executed in the SAP HANA system. You can sort this list according to certain criteria, such as the amount of memory used, the number of rows returned, and execution time. That way, you can find the 10 statements that took the most memory quickly and discover where the memory in your SAP HANA system disappeared to.You can open any of the SQL statements in the Visualize Plan or in the SQL Console, where you can test variations and measure the impact of your changes.
- Diagnosis Files
Next is the Diagnosis Files tab. Figure 10.9 shows a list of trace and log files. We filtered this list to the “xs” search string, meaning that it only shows files that have “xs” in the file name. You can read these files in SAP HANA studio, or if they’re too large, you can download them. You can download them as-is or compressed in a ZIP file.Figure 10.9 Filtered List of Trace and Log Files in Diagnosis Files Tab
These trace and log files can provide detailed process information, and SAP HANA can trace very specific events; for example, you can limit a trace file to a specific user or application.
- Trace Configuration
To set up specific traces, use the Trace Configuration tab, shown in Figure 10.10. This tab allows you to set up various trace configurations. We won’t go into any detail about available trace configurations here; this information is discussed in SAP HANA technical training, because these traces are normally executed by SAP HANA database or system administrators. You’ll need to work with your SAP HANA administrator if you want to run any traces.Figure 10.10 The Trace Configuration tab of the Administration Console.
Performance Analysis Mode
The last tool that we will look at in this chapter is Performance Analysis Mode, available from the top-right toolbar when you build SAP HANA information views.
You can switch this mode on or off when designing information views. The tool will analyze your view and suggest possible improvements.
Figure 10.11 shows the dropdown menu for Performance Analysis Mode. The first menu option allows you to switch this mode on or off.
Figure 10.11 Performance Analysis Mode Menu
Before we discuss the main features of Performance Analysis Mode, let’s briefly detour to look at the last option on the menu shown in Figure 10.11. The Visualize View in Planviz Editor option opens a simplified graphical flow of the information view (Figure 10.12).
When you select a node, you can see the attributes and measures on the left of the graphical flow and the input and output fields on the right.
Figure 10.12 Visualize View Called from Performance Analysis Mode Menu
Go back to the Performance Analysis Mode and you will notice that a new tab, Performance Analysis, has been added to the View Editor in which you design the information view. When you select a node—for example, the join node—in your view, and then then open the Performance Analysis tab (top middle), you will now see improvement recommendations from the SAP HANA system.
Figure 10.13 returns to the Purchase Overview calculation view from the SHINE content.
Figure 10.13 Performance Analysis Mode Optimization Suggestions
As shown in Figure 10.13, we’ve highlighted three possible improvement suggestions:
- Recommended join type or cardinality is different from what you have selected.
In this case, we did not specify any cardinality on the join. SAP HANA analyzed the data and view and suggested that a 1..n cardinality should be set. Setting the cardinality will allow SAP HANA to execute the view faster, because it won’t have to calculate what the cardinality is first. - Recommended: Referential join type.
SAP HANA looked at the data to see if referential integrity was ensured. Because there is referential integrity in this case, SAP HANA suggested that a referential join would be better in the join node of this view. - Threshold value exceeded. Check performance workbench preferences.
Here, SAP HANA looked at how many records will be returned by the join. If the number of records exceeds a threshold, SAP HANA suggests that partitioning the table could speed up the join. The system shows us information about the number of rows and the partitions.
Figure 10.14 shows additional information for when you’re joining tables from remote sources using SAP HANA smart data access (SDA). (We will discuss SDA in Chapter 14.)
Figure 10.14 Performance Analysis Mode Suggestions to Optimize SAP HANA Information View Built on Virtual Tables
You can set validation rules related to performance in the SAP HANA studio preferences. From the Window menu, select Preferences · SAP HANA · Modeler · Validation Rules. Figure 10.15 shows validation rules related to performance.
Figure 10.15 Validation Rules Related to Performance in SAP HANA Studio
Best Practices for Optimization
Performance Analysis Mode suggested some improvements we could make to our information view with regards to specifying the cardinality, using a referential join, and potentially partitioning our tables. In this last section, we will discuss some additional best practices for modeling in SAP HANA. Remember that we mentioned a few earlier in the chapter when we summarized concepts you’ve learned previously in the book.
We can divide these best practices into two areas—namely, those related to the graphical view design and those related to SQL and SQLScript.
Graphic Information Modeling Best Practices
The following are some additional best practices to optimize graphical information models:
- Minimize data transfer between SAP HANA execution engines.
- Reduce data transfer between information views.
- Implement filters at a table level.
- Don’t send all information to the analytics and reporting tools; instead, make your reports more interactive and make multiple smaller requests from SAP HANA. The SAP HANA system will use less memory, be more efficient, and respond faster. With less data moving across the network, the user experience improves. In addition, by sending less information to the analytics and reporting tools, you make these tools more mobile-friendly.
- Select only the fields you require; columnar databases don’t like working with all the columns.
- Specify the cardinality in a join.
- Use left outer rather than right outer joins.
- Investigate whether partitioning your tables is a valid option.
- Let SAP HANA do hard work like aggregations and calculations.
[+] Tip
Remember that SAP HANA uses authorizations as a filter. If a user is only allowed to see one cost center, then SAP HANA will use that as a filter to speed up the query. In this case, security is not an overhead as in many other systems but an optimization technique.
SQL and SQLScript Guidelines
We mentioned several best practices for optimizing SQL and SQLScript in Chapter 8, including the following:
- Do not use SELECT * statements.
- Try to avoid imperative logic with a loop or branch logic. SQL uses a set-oriented approach for best performance.
- Break large, complex SQL statements into smaller independent steps using SQLScript variables. This can improve the parallelization of the execution of the query.
- Use procedures or GROUPING SETS to return multiple results sets.
[+] Tip
For more optimization and performance tips, watch Werner Steyn’s excellent one-hour session, named DMM208, from SAP TechEd 2015 at http://events.sap.com/teched/en/session/26543.
You can also attend the HA215 training course. You can find more information about this course at https://training.sap.com/shop/course/ha215-sap-hana-monitoring-and-performance-tuning-classroom-010-g-en/.