SAP PRESS :: ABAP® Development for SAP HANA® - Read and Write Access in the Column Store

C.4Read Accesses with an Index

Now that you’ve learned about accesses without an index, we’ll turn our attention to accesses with an index. As mentioned, read accesses involving very large tables can, despite compression and parallelism, be too slow if these are executed very frequently. In such cases, an index should be created so you won’t have to scan the entire column. In SAP HANA, a distinction is made between the following two types of indexes:

Inverted index
Inverted indexes refer to only one column. Here, the index data are stored in internal memory structures that belong to the respective column—namely, the index offset vector and the index position vector. For each value in the dictionary vector, the index offset vector stores the position of this value’s first occurrence in the index position vector. The index position vector contains the row ID assigned to the data record in the attribute vector. The index position vector is sorted according to the indexed column and uses the row ID to reference the attribute vector.
Composite index
Composite indexes refer to more than one column. First, the contents of these columns are grouped together in an internal column, and an inverted index is then created for this internal column.

Let’s discuss read accesses for these two index categories.

As an example, we’ll create an index for the Name column. Because it concerns only one column, we’ll create an inverted index as shown in Figure C.5.

Figure C.5Inverted Index for the “Name” Field

We’ll continue to use the example from the previous sections, namely the WHERE NAME = 'Alex' condition. When there is no index, the entire attribute vector must be searched for the value determined from the dictionary vector.

With an inverted index, however, this is no longer necessary. The search process is then as follows:

1	First, in the dictionary vector, a binary search is performed to determine the value for “Alex”. This concerns the value “1” because “Alex” occupies first position in the dictionary vector.
2	Then, the index offset vector is checked to see which value occupies first position (and therefore contains information about the first value in the dictionary vector). In this case, the number “1” is stored there, which means that the positions of the value we require are stored in the first position in the index position vector.
3	Here, you find the following values in succession: “3” (in first position), “11” (in second position), and “13” (in third position). These values are the positions (row IDs) in the attribute vector occupied by “Alex” (the value “1”).
4	The search ends with the fourth entry in the index position vector because position 4 describes the end of the section being searched within the index offset vector. In other words, the value that lies after the value “Alex” in the dictionary vector occupies position 4 in the index position vector. Now, only the required columns from the other attribute vectors need to be read using the predetermined row IDs.

In the next example, we’ll create a composite index for the Gender and Name columns. In this case, an additional column is created in SAP HANA, and the contents of the Gender and Name columns are stored there together (see Figure C.6). As described earlier, an inverted index is created for this internal column, which isn’t visible in the ABAP Data Dictionary (DDIC).

Figure C.6Composite Index for the “Gender” and “Name” Fields

In our example, we’ll search the database table using the WHERE GENDER ='W' AND NAME = 'Tina' condition. When there was no index, the entire attribute vector for the Name column had to be searched.

This is no longer necessary. The search process is as follows:

1	In the dictionary vector for the internal column composed from the Gender and Name fields, a binary search is performed to determine the position. The required value occupies position 6.
2	In the index offset vector, the reference to the position in the index position vector (position 7 in our example) is obtained from position 6.
3	In the index position vector, the reference to the position in the attribute vector (position 9 in our case) is at position 7.
4	We’ve now determined row IDs for the attribute vectors whose columns are required for the `SELECT`. [»]Write Accesses with an Index Inverted and composite indexes occur in both the main store and the delta store. As with classic databases, write accesses in the delta store are more labor-intensive because the indexes have to be maintained. For a composite index, write accesses require more time and effort than with an inverted index because composite indexes require that more memory structures be maintained.