5.3 Introduction to Open SQL Statements
SAP supports different relational database management systems (RDBMSs) like Oracle, Microsoft SQL, and SAP HANA. Structured Query Language (SQL) is a programming language designed for working with data in a relational database management system.
ABAP supports two types of SQL:
-
Open SQL
Open SQL allows you to access database tables and perform certain actions independent of the underlying database that the SAP ERP system is using. When executing an Open SQL statement, the database interface of the work process will take care of converting the Open SQL statement to a Native SQL statement—that is, native to the underlying database. This ensures your ABAP code is portable and will work irrespective of the underlying database. -
Native SQL
Native SQL allows you to use database-specific SQL statements in an ABAP program. With a Native SQL statement, you can use database tables that are not part of ABAP Data Dictionary. This allows you to integrate data that is not part of the SAP ERP system. To use a Native SQL statement in an ABAP program, precede the statement with the EXEC SQL keyword and close it with the ENDEXEC statement—for example:EXEC SQL.
<native sql statement>.
ENDEXEC.
SQL statements are categorized into the following languages:
-
Data Definition Language (DDL)
DDL statements are used to define the database structure or schema. Using a DDL statement, you can perform actions such as creating a table, dropping a table, changing the structure of the database table, and so on.
The following are some examples of DDL statements:- CREATE: Creates objects in the database
- ALTER: Alters the structure of the database
- DROP: Deletes objects from the database
- TRUNCATE: Removes all records from a table, including all spaces allocated for the records
- RENAME: Renames an object
-
Data Manipulation Language (DML)
DML statements manage data within schema objects. Using DML statements, you can’t alter the database schema but you can manipulate the data. For example, you can fetch the data from a database table, modify the records of the database table, delete records from the database table, and so on.
The following are some examples of DML statements:
With Open SQL, you can use DML statements. With Native SQL, you can use DDL statements. We generally use ABAP Data Dictionary to perform tasks that require using DDL statements with Native SQL.
In this section, we’ll discuss various key concepts of an RDBMS, such as database design, tables and keys, foreign key relationships, and normalization. We’ll also provide an introduction to Open SQL statements used to fetch data from database tables.
5.3.1 Database Overview
As discussed in Chapter 2, the database is managed by an RDBMS. A relational database consists of multiple tables that store particular sets of data, and an RDBMS standardizes how the data is stored and processed. In a relational database, data is stored in multiple tables, with a relationship between the tables.
The relational database model was first conceived by E. F. Codd in 1969. The model is based on set theory and predicate logic (branches of mathematics). The idea behind the relational model is that the database consists of a series of tables that can be manipulated using a declarative approach (nonprocedural operations).
It’s commonly thought that the word relational in a relational database pertains to the fact that you relate the tables in the database to one another. However, a table is also known as a relation, and the word relational takes its root from this. In fact, Codd and other database theorists use the terms relations, attributes, and tuples to refer to what developers call tables, columns (fields), and rows (or what we’d call files, fields, and records in a physical sense).
In the following subsections, we’ll explain how a database is modeled. We’ll start by discussing the use of database tables and the significance of table keys. We’ll then explore the concept of foreign key relationships to establish relationships between different tables of the database. We’ll also discuss the concept of normalization, which helps to simplify database design and avoid redundancy. We’ll conclude this section with an introduction to SAP HANA and the many benefits this database provides.
Relational Database Design
A database model to reflect a real-world system takes time and effort to conceive, build, and refine. You need to make decisions about the tables you need to create, what fields they’ll contain, and how the tables can be related to each other.
A good database that’s designed according to the relational model provides many advantages. A good design makes the data entries, updates, and deletions efficient, and it also helps make retrieval and summarization of data easy. Because most data is stored in the database and not in the application, the database should be sufficiently self-documenting. A good database design should also consider future changes in real-world requirements and should be flexible to allow for changes to the database schema.
Tables and Keys
A table is the basic entity in a relational model. Each table should represent an entity in the real world, and these entities can be real-world objects or events. For example, a customer is a real-world object, and the order placed by a customer is an event.
A table consists of rows and columns. The relational model dictates that each row in a table should be unique so that it can be uniquely addressed through programming. The uniqueness of a row is guaranteed by defining a primary key for the table. The primary key of a table consists of one or more fields that uniquely identify a row. Each table can have only one primary key. All fields from which a primary key is drawn are called candidate keys. Multiple secondary keys can be drawn from the other fields of the table.
Secondary keys can be unique or non-unique. Secondary keys are generally used to define secondary indexes for the table in order to speed up data retrieval. Keys can be simple or composite. If a key is made up of one field, it’s called a simple key; if the key is made up of two or more fields, it’s called a composite key.
When modeling a table, you should decide which candidate keys form the primary key of the table based on business requirements; there are no hard and fast rules to define the primary key of a table. In his book titled SQL and Relational Basics, Fabian Pascal notes that such a decision should be based upon the principles of minimalism (choose the fewest fields necessary), stability (choose a key that seldom changes), and simplicity/familiarity (choose a key that is both simple and familiar to users).
For example, if you have a table to store employee details—such as employee ID, first name, last name, address, telephone number, and zip code—which of these fields will make for a good primary key? Following Pascal’s guidelines, the address and telephone number can be ruled out because they change frequently. There’s also a chance of the employee’s name changing due to marriage or other reasons; plus, names can be misspelled, and there’s a high likelihood that two employees will have the same first or last name.
Because the employee ID is unique across the organization, it makes the best candidate for the primary key. Even though there are no fixed rules for choosing a primary key, most developers prefer choosing a numeric primary key, because searches on numeric fields are more efficient than those on text fields.
Tip
In many situations, it’s recommended to use static numbers, such as employee ID, customer ID, document number, and so on, as primary keys and to avoid descriptive texts.
Foreign Keys and Domains
Even though primary keys are defined in individual tables, there may be a need to define a relationship with different tables of the database using foreign key relations. A foreign key is a field in a table that is used to reference a primary key in another table.
Let’s look at an example. Table 5.2 shows a customer table in which customer details are stored. The primary key of this table is the Cust ID column, which uniquely identifies each row.
Cust ID | Name | Address | ZIP | Phone |
---|---|---|---|---|
1 | Joey | 171 CE | 110099 | 99889900 |
2 | James | 345 DE | 118899 | 88997788 |
3 | John | 563 WE | 442299 | 88993354 |
Table 5.3 shows an orders table that stores information about customer orders. The primary key of this table is the Order ID column, which uniquely identifies each order.
Order ID | Cust ID | Item | Qty | UoM |
---|---|---|---|---|
2230 | 1 | Pen | 10 | Pieces |
2231 | 2 | Pencil | 10 | Pieces |
In this example, the Cust ID field of Table 5.3 is considered the foreign key of the table, because it can be used to refer to the customer table (see Table 5.2).
It’s important that both the primary keys and foreign keys that are used to form a relationship share the same meaning and domain. A domain defines a possible set of values for a field.
For example, if a valid value for a customer ID can be a number between 1 and 10,000, then the Cust ID fields in both tables (Table 5.2 and Table 5.3) should adhere to this range. By implementing a foreign key check, we can also ensure that the Cust ID field in the orders table (Table 5.3) consists of only the values available in the Cust ID field of the customers table (Table 5.2).
Relationship
Foreign keys model relationships between real-world entities. Such real-world entities can have complex relationships—for example, one entity may have multiple relations with other entities. In a relational database, relationships are defined between a pair of tables.
These tables can be related in one of three ways:
-
One-to-one relationship
Two tables are said to be in a one-to-one relationship if for every row in the first table there is at most one row in the second table. One-to-one relationships are seldom used to model real-world entities; they are mostly used to split the data into multiple tables due to software limitations.
For example, you can split data into two tables if the number of fields exceeds the limitation of 249 per table, or you can model two tables with customer data if one of the tables can store more sensitive information about a customer. You have better access to control restrictions for this table compared to the table that stores general information about the customer. The tables in a one-to-one relationship should have similar primary keys. -
One-to-many relationship
Two tables are said to be in a one-to-many relationship if for every row in the first table there can be zero, one, or multiple rows in the second table, but for every row in the second table there is exactly one row in the first table.
One-to-many relations are the most commonly modeled. The tables in a one-to-many relation are also called parent-child tables or header-item tables. Common tables with one-to-many relationships in the SAP system include order tables such as VBAK-VBAP, which stores header and item details of sales orders, and EKKO-EKPO, which stores the header and item details of purchase orders. The one-to-many relationship is also used to link a base table with a lookup table. For example, in the orders table you can store a two-character abbreviation for a unit of measure, and this info can be linked to a lookup table in which the descriptions for the abbreviations are maintained. -
Many-to-many relationship
Two tables are said to be in a many-to-many relationship when for every row in the first table there can be many rows in the second table, and for every row in the second table there can be many rows in the first table. Many-to-many relations cannot be directly modeled in a relational database, and they’re typically broken down into multiple one-to-many relations.
Normalization
Normalization is the process of simplifying the design of a database. Normalization answers core questions regarding the number of tables required, what each table represents, the number of fields in each table, and the relationship between tables. Normalization also helps avoid redundancy and anomalies in database design.
Anomalies can occur while inserting, updating, or deleting records from a database table (if the database is poorly designed). For example, if you have a database table as shown Table 5.4, when a record is updated to change the customer address, all the rows where the customer record exists should be updated. If any rows are missed, then it will lead to data inconsistency. This is called an update anomaly.
Item_ID | Customer_ID | Customer_name | Customer_address |
---|---|---|---|
1 | 921 | Ryan | 12E West End |
2 | 921 | Ryan | 12E West End |
3 | 728 | John | 44 Avenue Road |
If you try to insert a record for a customer that isn’t available in the customer master table, it causes an insertion anomaly. If you delete item 3 in Table 5.4, it will also delete the record of the customer John, because his details are linked to this item only. This is called a delete anomaly.
Normalization helps to overcome these anomalies and model a database that is consistent and predictable. Normalizations use the concept of normal forms to assist in designing an optimal structure. There are three available normal forms: first, second, and third.
The following subsections look at each of these normal forms.
First Normal Form
The first normal form (1NF) dictates that the field of a table should contain atomic values. The field should contain only one value. For example, the data in Table 5.5 does not conform to 1NF, because the phone number field contains multiple values for the Emp_id 2290.
Emp_id | Emp_name | Phone_num |
---|---|---|
2289 | John | 99889988 |
2290 | Mark |
88998899 99887755 |
To make Table 5.5 conform to 1NF, it should be adjusted as shown in Table 5.6 by creating a new record for the second phone number in Emp_id 2290.
Emp_id | Emp_name | Phone_num |
---|---|---|
2289 | John | 99889988 |
2290 | Mark | 88998899 |
2290 | Mark | 99887755 |
Second Normal Form
For a table to conform to the second normal form (2NF), it should meet the following criteria:
- It should conform to 1NF.
- The non-key fields of the table should be fully dependent on all the primary key fields and not on the subset of the key fields.
For example, in Table 5.7, let’s assume that Employee_ID and Department_ID are the primary key fields. According to 2NF, the non-primary fields Employee_name and Department_name should be fully dependent on the complete primary key fields Employee_ID and Department_ID, not on the subset of the primary key fields.
That’s not the case here; the Employee_name field can be identified by the Employee_ID field, and the Department_name field can be identified fully by the Department_ID field. Therefore, the non-primary fields of this table are only partially dependent on the primary key fields of the table, which is not allowed in 2NF.
Employee_ID | Department_ID | Employee_name | Department_name |
---|---|---|---|
1122 | 01 | John | Sales |
1123 | 02 | Mark | Finance |
To make Table 5.7 conform to 2NF, it should be split into two tables, as shown in Table 5.8 and Table 5.9. This ensures that there is no partial dependency on primary key fields.
Employee_ID | Department_ID | Employee_name |
---|---|---|
1122 | 01 | John |
1123 | 02 | Mark |
Department_ID | Department_name |
---|---|
01 | Sales |
02 | Finance |
Third Normal Form
A table conforms to the third normal form (3NF) if it conforms to 2NF and all non-key fields of the table are mutually independent.
For example, Table 5.10 does not conform to third normal form, because the field Total is dependent on the information in the nonkey fields Quantity and Price. To make this table conform to 3NF, it’s best to remove the Total field from the table and handle the calculations in a query or report. This ensures that updating one of the fields of the table does not result in any anomalies.
Item_id | Quantity | Price | Total |
---|---|---|---|
1 | 10 | 10 | 100 |
2 | 20 | 15 | 300 |
Every higher normal form is a superset of all lower normal forms. For example, if you design for 3NF, then by default it conforms to 1NF and 2NF. Normalization helps achieve optimal database design. A good database design takes time and effort and requires a good understanding of business requirements. A good design not only makes your applications efficient, but also prevents headaches down the line.
SAP HANA
There has been a growing need for real-time analytics over the years, which requires processing huge sets of data. Traditional database systems form data processing speed bottlenecks for large data volumes. SAP addressed this problem with its in-memory database system called SAP HANA.
SAP HANA has transformed the relational database industry with its innovations in hardware and software technology to process massive datasets in real time using in-memory computing. SAP HANA combines database, application processing, and integration services on a single platform.
SAP HANA stores compressed data in memory in a columnar format that enables faster scanning and quicker processing. Compared to traditional databases, programs take minutes to process the data from SAP HANA, as opposed to hours when using a traditional database.
The following are some important features to consider with SAP HANA when it comes to ABAP:
-
Parallelism
One of the main features of the SAP HANA database is parallelism. SAP HANA takes advantage of the multiple cores of the processor to process information in parallel. Traditional database systems were built decades ago, when only single-core processors existed and the DRAM was extremely expensive.
The hardware and computing power of systems has changed tremendously over the years. SAP designed SAP HANA from the ground up to take advantage of modern hardware capabilities. Typically, an SAP HANA system performs around 3.5 billion scans per second per core and around twelve to fifteen million aggregations per second per core. This enables you to perform real-time analytics and process huge sets of data on the fly. -
Code pushdown
The SAP HANA database allows you to push certain data-intensive operations to the database using a technique known as code pushdown.
For example, if you want to calculate the sum of all invoices, you can use an aggregate function on the database instead of fetching all the invoice data into the application server and calculating the sum in an internal table loop. This can be achieved using ABAP Core Data Services (CDS) views. -
ABAP Core Data Services (CDS) views
ABAP CDS views are not persistent objects, but a projection of a database entity. An ABAP CDS view is created as a design-time file in the repository and exists as an ABAP Data Dictionary object once activated. This allows you to access the CDS view using Open SQL and to use aggregate functions.
5.3.2 Selecting Data from Database Tables
You can use internal tables to process data from a database table, allowing you to store multiple rows of data at once. In this section, we’ll explore a few Open SQL statements that can be used to process data from a database.
The SELECT keyword fetches data from a database table. You can select a single record or multiple records, and you can select the data from the database into a structure or an internal table of your ABAP program. You can either select the complete row (all fields) of a database table or select only specific fields.
UNION Addition
As of SAP NetWeaver 7.5, the UNION addition can be used to create a union between the result sets of two SELECT statements.
When selecting data from a database, the row of the data object should be identical to the row of the database table. To ensure this, refer your data object to the table when selecting the complete row. If you need to select only certain fields from the row, make sure to define the fields in the same order in which you plan to select them. The SELECT statement supports an INTO CORRESPONDING FIELDS addition if the order of fields in your data object doesn’t match the order of the selection.
Listing 5.23 shows the usage of a SELECT SINGLE statement to select a single row from the database table. If more than one row satisfies the WHERE clause, the first matching row is fetched.
TYPES: BEGIN OF ty_vbrk,
VBELN TYPE VBELN_VF, "Document Number
FKDAT TYPE FKDAT, "Bill Date
NETWR TYPE NETWR, "Net Value
KUNRG TYPE KUNRG, "Payer
END OF ty_vbrk.
DATA : it_vbrk TYPE STANDARD TABLE OF ty_vbrk,
wa_vbrk TYPE ty_vbrk.
*Selecting a single row of specific fields from table vbrk.
SELECT SINGLE vbeln fkdat netwr kunrg FROM vbrk
INTO wa_vbrk
WHERE vbeln EQ '9000'.
Listing 5.24 uses SELECT…ENDSELECT to fetch all matching rows, one row at a time. The system loops between SELECT…ENDSELECT, transferring one row at a time to the structure wa_vbrk. You can process the contents of wa_vbrk in each loop by writing the statements before ENDSELECT. This is an efficient way to process the records from the database table if you plan to use a database table only once during program execution.
However, if you plan to reuse the data multiple times during the life of the program execution, we recommend using internal tables, which allows you to fetch the data once into your program and access it subsequently multiple times, as opposed to accessing the database every time you need the same data.
*Selecting all matching rows of specific fields from the table
*vbrk using a structure.
*Here the system fetches one row at a time and loops between
*select…endselect. The row can be processed between SELECT…ENDSELECT.
SELECT vbeln fkdat netwr kunrg FROM vbrk
INTO wa_vbrk
WHERE vbeln EQ '9000'.
ENDSELECT.
Listing 5.25 shows using an internal table in a SELECT statement, also known as performing an array fetch. Once the SELECT statement is executed, you can access the data from the internal table as discussed previously.
*Selecting all matching rows of specific fields from the table
*vbrk using an internal table.
SELECT vbeln fkdat netwr kunrg FROM vbrk
INTO TABLE it_vbrk
WHERE vbeln EQ '9000'.
Listing 5.26 shows selecting the fields of the complete table row. Notice the difference in data object declaration and the usage of * as the SELECT statement.
DATA : it_vbrk TYPE STANDARD TABLE OF vbrk,
wa_vbrk TYPE vbrk.
*Selecting a single row of all fields from table vbrk.
SELECT SINGLE * FROM vbrk
INTO wa_vbrk
WHERE vbeln EQ '9000'.
*Selecting all matching rows of all fields from table vbrk
*using a structure.
SELECT * FROM vbrk
INTO wa_vbrk
WHERE vbeln EQ '9000'.
ENDSELECT.
*Selecting all matching rows of all fields from table vbrk
*using an internal table.
SELECT * FROM vbrk
INTO TABLE it_vbrk
WHERE vbeln EQ '9000'.
5.3.3 Selecting Data from Multiple Tables
The syntax we’ve discussed so far allows you to select data from a single table. However, in real-world applications, data is spread across multiple tables with foreign keys.
To select data from tables that are in a foreign key relationship, you can use joins or use the FOR ALL ENTRIES addition with the SELECT statement. The FOR ALL ENTRIES addition allows you to select the data from the database table and compare it to the data in an internal table that was previously selected from another database table. This is efficient if you’re processing data from two tables.
Listing 5.27 shows the syntax to use for all entries. Here, the data is first selected from database table VBRK into an internal table IT_VBRK; then the data in IT_VBRK is compared to select the data from database table VBRP.
When using FOR ALL ENTRIES, make sure the internal table used for comparison isn’t empty. Otherwise, the system will execute the SELECT statement without restriction. Notice the IF condition used in the listing.
SELECT * FROM vbrk
INTO TABLE it_vbrk
WHERE vbeln EQ p_vbeln.
IF sy-subrc IS INITIAL.
SELECT * FROM vbrp
INTO TABLE it_vbrp
FOR ALL ENTRIES IN it_vbrk
WHERE vbeln EQ it_vbrk-vbeln.
ENDIF.
Alternately, joins allow you to select data from up to twenty-five database tables. There are two types of joins:
-
Inner joins
Inner joins require an entry with the KEY of one table in the second table, to be extracted to an internal table. -
Outer joins
Irrespective of whether an entry exists or not in the second table, outer joins still extract data from the first table.
Listing 5.28 highlights the difference between an inner and outer join. If the internal table has three fields—FLD1, FLD2, and FLD3—the result for an inner join and outer join are shown in Listing 5.28.
Table A |
Table B |
||
---|---|---|---|
Field 1 |
Field 2 |
Field 1 |
Field 3 |
X |
100 |
X |
ABC |
Y |
200 |
X |
DEF |
Z |
300 |
Z |
WXY |
Inner Join:
-----------
FLD1 FLD2 FLD3
--------------------
X 100 ABC
X 100 DEF
Z 300 WXY
Outer Join:
-----------
FLD1 FLD2 FLD3
--------------------
X 100 ABC
X 100 DEF
Y 200 "No entry for Key Y in table B so FLD3 is blank
Z 300 GHI
Listing 5.29 shows an example using an inner join. Here, we join the columns carrname, connid, and fldate of the database tables scarr, spfli, and sflight using two inner joins. This creates a list of flights from p_cityfr to p_cityto. An alias name is assigned to each table.
PARAMETERS: p_cityfr TYPE spfli-cityfrom,
p_cityto TYPE spfli-cityto.
TYPES: BEGIN OF wa,
fldate TYPE sflight-fldate,
carrname TYPE scarr-carrname,
connid TYPE spfli-connid,
END OF wa.
DATA itab TYPE SORTED TABLE OF wa
WITH UNIQUE KEY fldate carrname connid.
SELECT c~carrname p~connid f~fldate
FROM ( ( scarr AS c
INNER JOIN spfli AS p ON p~carrid = c~carrid
AND p~cityfrom = p_cityfr
AND p~cityto = p_cityto )
INNER JOIN sflight AS f ON f~carrid = p~carrid
AND f~connid = p~connid )
INTO CORRESPONDING FIELDS OF TABLE itab.
Listing 5.30 shows an example using an outer join.
PARAMETERS p_cityfr TYPE spfli-cityfrom.
TYPES: BEGIN OF wa,
carrid TYPE scarr-carrid,
carrname TYPE scarr-carrname,
connid TYPE spfli-connid,
END OF wa.
DATA itab TYPE SORTED TABLE OF wa
WITH NON-UNIQUE KEY carrid.
START-OF-SELECTION.
SELECT s~carrid s~carrname p~connid
FROM scarr AS s
LEFT OUTER JOIN spfli AS p ON s~carrid = p~carrid
AND p~cityfrom = p_cityfr
INTO CORRESPONDING FIELDS OF TABLE itab.
Using joins may seem a bit overwhelming initially, so we suggest you practice using FOR ALL ENTRIES and use joins only if you need data from many tables.