8 Integrating External Data
Because companies might use non-SAP databases to store their data, integrating SAP and non-SAP data is critical for enterprise reports. In this chapter, we discuss SDA and SDI in SAP BW/4HANA, which make accessing external data more efficient and cost-effective.
Companies often replicate or load data from other systems into an SAP BW/4HANA system and then perform analysis and processing in SAP BW/4HANA. However, not only does data loading or replication consume a lot of time and system resources (i.e., CPU and memory), but also, it can be difficult to manage delta loading from source systems to SAP BW/4HANA.
In this chapter, we’ll look at how SAP HANA smart data access (SDA) and SAP HANA smart data integration (SDI) can help you load your data into the SAP BW/4HANA system. We’ll then discuss the options for integrating data with the SAP BW/4HANA system, including combining data under a BW schema or via Open ODS views.
8.1 SAP HANA Smart Data Access
SDA enables remote data to be accessed via SAP HANA SQL queries as if they’re local tables in SAP HANA, without copying the data into the SAP HANA database. With SDA, data can be combined from heterogeneous remote source systems—such Microsoft SQL Server, Oracle, DB2, Microsoft Excel, Teradata, Sybase IQ, and Hadoop—into EDW landscapes. SDA makes it possible to access remote data without having to replicate the data into the SAP HANA database. The following are some use cases for which to consider using SDA:
- Making other data warehouses transparent for SAP BW/4HANA. Previously, you needed to set up a DB connection to load data into SAP BW/4HANA.
- Consolidating your data warehouse landscape.
- Consuming remote data from multiple connected remote databases in SAP BW/4HANA.
SDA not only provides operational and cost benefits, but also supports the development and deployment of the next generation of analytical applications, which require the ability to access and integrate data from multiple systems in real time regardless of where the data is located or which system is generating it.
SDA is based on local virtual tables that map to existing remote tables in the remote database. Data required from remote sources will be visible in virtual tables in SAP HANA. SDA will enable remote access (smart access) to data regardless of where the data is located.
SAP HANA developers can then create SAP HANA views on top of the virtual tables or combine data from virtual tables and internal SAP HANA tables in SAP HANA Studio. The SAP HANA query processor optimizes these queries and executes the relevant parts of the queries in the remote database, returns the results of the query to SAP HANA, and completes the query.
Not only can an SAP HANA developer select data in a virtual table and pull data from a remote data source into an SAP HANA table, but SDA even supports INSERT/UPDATE/DELETE statements executed against the virtual table. The data in a virtual table can be modified (i.e., via insert, update, and delete operations) and then written back to the remote table in a remote data source.
Before you create remote connections to external databases, it’s a good idea to examine if your databases are supported by SDA or SDI in SAP BW/4HANA, as we’ll discuss in the following section. We’ll then briefly look at the steps to configure SDA and how to create remote sources.
8.1.1 Supported Remote Sources
As of SAP HANA SPS 11, the following remote sources are supported by SDA (refer to SAP Note 1868209 for more information):
- Oracle 12c
- Microsoft SQL Server 2008, 2012
- SAP HANA (SAP BW powered by SAP HANA, SAP Business Suite on SAP HANA)
- SAP ERP and SAP BW
- Hadoop
- Excel worksheets
- Text files
- Teradata
- DB2
- Sybase IQ
- Apache Hive 0.9.0 or higher and Simba Hive ODBC driver
8.1.2 Configure SAP HANA Smart Data Access
The steps to configure are as follows:
- Download and install the Unix ODBC driver manager (2.3.0 or above) from http://www.unixodbc.org/ and use it to test the connection to external databases in the Linux box on the SAP HANA server. Because you need administrator rights, you may need to get help from your SAP Basis team to install the driver manager. For more information, see the SAP HANA Administration Guide at https://help.sap.com/hana/SAP_HANA_Administration_Guide_en.pdf.
- Download and install the Unix ODBC drivers for external databases from each database provider’s website. Because installing Unix ODBC drivers requires administrator rights, you may need to get help from your SAP Basis team. For more information, see the SAP HANA Administration Guide.
- To start the process to create the odbc.ini file (which SAP HANA requires to communicate with remote databases) in $HOME directory in the SAP HANA Linux server, log in with the admin user and create odbc.ini file.
-
Create a DSN for external databases. You will perform the following:
-
Configure SQL server in the odbc.ini file as follows:
[DSN]
SERVER = <your server>, <port number>
Driver = <Location of the MS SQL Server ODBC driver>
Database = <your database name> -
Configure the Hadoop system in the odbc.ini file as follows:
[HIVE]
Driver = <Location of the Hive ODBC driver>
HOST = <Hadoop host name>
Port = <port number>Note
For SQL server, the default port number is 1433. To find the SQL server port being used, open the SQL Server Configuration Manager, then locate the TCP port (see Figure 8.1).
For an SQL server, for example, the driver is /opt/microsoft/msodbcsql/lib64/ libmsodbcsql-13.0.so.0.0
For Hadoop, the driver is /simba/hiveodbc/lib/64/libsimbahiveodbc64.so
For database, enter the name of database you want to connect to; if the database isn’t specified, the system will connect to the default database of the connected server.
Figure 8.1 Locate TCP Port in SQL Server Configuration Manager
-
Configure SQL server in the odbc.ini file as follows:
- Restart SAP HANA.
-
Test the ODBC connection in the Linux box. Before you create remote sources for SAP HANA Studio, you should test the ODBC connection in the Linux box to ensure the connection has been set up correctly. In the Linux server, you can use the following command to test the connection:
isql -v <DSN> <user name> <password>
-
Provide the following parameters to connect to the database:
- DSN: Enter the data source name defined in odbc.ini file.
- Enter the user name and password.
-
If the Unix ODBC connection has been set up correctly, the result should look something like Figure 8.2.
Figure 8.2 Testing ODBC Connection
8.1.3 Create Remote Source via Smart Data Access Adapter
The following privileges are required in SAP HANA to manage the agent and adapters and to create remote sources and virtual tables in SAP HANA Studio:
-
System privileges:
- AGENT ADMIN ADAPTER ADMIN
- CREATE REMOTE SOURCE
-
SQL object privileges on the remote source:
- CREATE VIRTUAL TABLE
- DROP
Once the SAP HANA administrator has granted those roles to you, you can create remote sources in SAP HANA Studio to connect the external databases defined in odbc.ini in the Linux box, as in Section 8.1.2.
Log on to SAP HANA Studio, as shown in Figure 8.3, and go to Provisioning • Remote Sources. Right-click Remote Sources and select New Remote Source.
Figure 8.3 Create New Remote Source
A new screen will open, which should look something like Figure 8.4 and should include the following parameters:
-
Source Name
Specify the name of the remote source. -
Adapter Name
Choose an adapter from the predefined adapter list, as shown in Figure 8.5. SAP provides a wide range of adapters: Oracle ODBC adapter, MSSQL ODBC adapter, HADOOP adapter, and so on.Figure 8.4 Create Remote Source
Figure 8.5 List of Adapters
-
Source Location
For an SDA adapter, the only option is indexserver.
The other parameters will change according to the adapter selected, which will be explained in more detail in the next few sections.
Create Remote Source Connection to SQL Server via SDA Adapter
With an SDA adapter, a remote source can be created to connect to the SQL Server 2012 database. The MSSQL (GENERIC ODBC) adapter is an SDA adapter and will be used to connect to the SQL Server database. The properties shown in Figure 8.6 need to be populated as follows:
-
Source Name
Enter the source name for your remote source. -
Adapter Name
Choose MSSQL (GENERIC ODBC) from the predefined adapter list. -
Source Location
The indexserver option will be used for an SDA adapter. -
Adapter Version
Select SQL Server 2012. -
Connection Mode
For MSSQL (GENERIC ODBC), a developer can only choose Data Source Name; for other adapters, more modes might be available. -
Data Source Name
Provide the data source name defined in odbc.ini.Figure 8.6 Properties to Create SQL Server Remote Source
-
DML Mode:
- Read-only Will only allow you to read into SAP HANA.
- Read-write Will allow you to write changes in virtual tables back to the original database.
-
Credential Mode
Choose Technical user. -
User Name and Password
Provide the user name and password of the remote SQL Server system for logon.
Note
If the database name isn’t specified in odbc.ini, only the default database will be available under the Provisioning folder in SAP HANA Studio.
You can follow Figure 8.6 to create remote sources using the graphic method, but you can also create remote source connections to remote databases with SQLScript. Listing 8.1 provides a template for SQLScript to create remote sources for SQL server; it’s been tested in SAP HANA SPS 11. You can copy the script to the SQL console, change the parameters accordingly, and execute the script to create the remote source connection to SQL server.
CREATE REMOTE SOURCE <input your remote source name>
ADAPTER "odbc"
CONFIGURATION FILE 'property_mss.ini'
CONFIGURATION 'DSN=<input DSN>'
WITH CREDENTIAL TYPE 'PASSWORD'
USING 'user=<Input username>; password=<Input the password>';
Listing 8.1 Create Remote Source for MS SQL Server
Create Remote Source Connection to Hadoop
SDA can support access to Hadoop remotely. To connect to a Hadoop system, the Simba ODBC driver needs to be installed in the Linux box on the SAP HANA server; the driver can be downloaded from http://startupfocus.saphana.com/spark/spark-sap-hana-integration/.
Figure 8.7 illustrates the role that ODBC drivers play in the communication between SAP HANA and a Hadoop cluster.
Figure 8.7 Connect Hadoop via ODBC
In this case, choose HADOOP (ODBC) for Adapter Name, select ATA Source Name for Connection Mode, specify the Data Source Name as defined in odbc.ini, and provide the User Name and Password to connect the Hadoop system from SAP HANA Studio.
You can also choose the SPARK SQL adapter to create the remote source. See Chapter 9, Section 9.3.2 on NLS configuration for more details.
Figure 8.8 shows the properties to create a Hadoop remote source.
Figure 8.8 Properties to Create Hadoop Remote Source
You can also create remote source connection to a remote Hadoop system with SQLScript. Listing 8.2 provides an SAP HANA SQL script template to create a remote source for Hadoop; it’s been tested in SAP HANA SPS 11. You can copy the script to the SAP HANA SQL console, change the parameters accordingly, and execute it to create the remote source connection for Hadoop.
CREATE REMOTE SOURCE <input your remote source name for hadoop>
ADAPTER "hiveodbc"
CONFIGURATION 'DSN=<specify DSN defined in odbc.ini file>'
WITH CREDENTIAL TYPE 'PASSWORD'
USING 'user=<provide the username>; password=<provide the password>';
Listing 8.2 Create Remote Source for Hadoop