10.2Types of Text Data and Full Text Indexes in SAP HANA
The fuzzy search in SAP HANA is based on the data types in the column store. Here, TEXT and SHORTTEXT represent two specific data types that are dedicated for text searches (and text analyses). The SHORTTEXT data type is used for character strings of a given length (similar to NVARCHAR), whereas TEXT represents a large object (similar to NCLOB—the SQL data type for a string in the ABAP Data Dictionary [DDIC]). In this context, texts are internally fragmented into tokens that form the basis for searches and analyses. The following sections provide detailed information about this subject.
Unfortunately, however, there is currently no native support available in ABAP for the TEXT and SHORTTEXT data types, which makes it impossible to create a table via the DDIC that uses these data types. In addition, although the fuzzy search function is also supported for other data types (e.g., VARCHAR and NVARCHAR), this support isn’t extensive enough. Without the ability to split the texts into searchable tokens, the system can’t recognize a permutation of words, which is a standard in modern search applications. Similarly, the reversion to their word stem doesn’t work directly with these data types.
What you can do, however, is add the functionality offered by the text data types to a specific column by creating a full text index. In this way, you can enable the text search and text analysis functions for the majority of character-type DDIC types (including CHAR, STRING, DATS, etc.). When you create a full text index for a table column, the system creates an internal, invisible column (shadow column) of the TEXT type, which contains the same data but in a presentation optimized for search requests. In this context, the text is fragmented into tokens, and an additional dictionary is generated.
Figure 10.2 shows the internal presentation in a schema based on the example of airline names.
Figure 10.2Schematic Presentation of the Full Text Index
Note that the shadow column exists only transiently in the main memory. When you load the table into the memory (e.g., after a database restart), this data structure is created anew. Section 10.6 contains further details about the memory consumption of a full text index.
You can create the full text index using the CREATE FULLTEXT INDEX SQL statement:
ON <table name> ( <column name> )
[<parameter list>]
Here, you can use numerous optional settings, as described in the following using examples. However, you can obtain comprehensive documentation from http://help.sap.com/hana. You should also note that the name of the full text index must be unique within a schema, so it makes sense to prefix the index name with the table name to avoid name clashes.
The following SQL statement defines a full text index for the CARRNAME column of table SCARR:
Because you can’t create full text indexes via the DDIC (Transaction SE11) prior to ABAP Release 7.4, these indexes can’t be transported automatically. As of ABAP Release 7.4, it’s also possible to create a full text index via the DDIC using common parameters. For this purpose, you must define a new index for a table using Transaction SE11 or, rather, an extension index (for a modification-free extension of an SAP standard table). This contains only the required column as a field and is created exclusively in the SAP HANA database. Figure 10.3 shows this type of index in the CITY column of table SCUSTOM.
Via Goto • Full Text Index, you can then activate the full text index and set a variety of parameters (see Figure 10.4). The parameters correspond to the CREATE FULLTEXT INDEX statement parameters mentioned earlier. The standard settings and a language configuration (using a column of the table or a fixed language) are usually sufficient for a fuzzy search.
Figure 10.3Creating a Full Text Index via the DDIC
Figure 10.4Configuring a Full Text Index via the DDIC
In addition to using the DDIC, you can also use Native SQL in an ABAP program to create a full text index. This allows you to use the entire range of options available for a text search (also those that can’t be selected via the configuration screen in Transaction SE11); however, this method requires you to manage the index in a system landscape yourself. This variant can also be useful in older ABAP releases or in side-by-side scenarios. Listing 10.1 shows how you can create and remove full text indexes using the ABAP Database Connectivity (ADBC) interface:
" Configuration
PARAMETERS:
table LIKE dd02l-tabname DEFAULT 'SCUSTOM',
column LIKE dd03l-fieldname DEFAULT 'NAME',
fzyidx TYPE abap_bool AS CHECKBOX DEFAULT abap_false,
ta TYPE abap_bool AS CHECKBOX DEFAULT abap_false,
taconfig TYPE string DEFAULT 'EXTRACTION_CORE',
drop TYPE abap_bool AS CHECKBOX DEFAULT abap_true,
create TYPE abap_bool AS CHECKBOX DEFAULT abap_true.
" Index name (<table>~<column>)
DATA(lv_idx) = table && '~' && column.
" SQL statement for creating a full text index
DATA(lv_sql) = |CREATE FULLTEXT INDEX { lv_idx } |
&& |ON { table }({ column })|.
" Additional fuzzy search index
IF ( fzyidx = abap_true ).
lv_sql = lv_sql && ' FUZZY SEARCH INDEX ON'.
ENDIF.
" Text analysis
IF ( ta = abap_true ).
lv_sql = lv_sql && ' TEXT ANALYSIS ON'.
" Special configuration of text analysis
IF ( taconfig IS NOT INITIAL ).
lv_sql = lv_sql && | CONFIGURATION '{ taconfig }'|.
ENDIF.
ENDIF.
IF ( drop = abap_true ).
TRY.
" Remove index
cl_sql_connection=>get_connection(
)->create_statement( )->execute_ddl(
|DROP FULLTEXT INDEX { lv_idx }|
).
WRITE: / |Fulltext index { lv_idx } removed|.
CATCH cx_sql_exception INTO DATA(lo_ex).
" Error handling
WRITE: / | Error: { lo_ex->get_text( ) }|.
ENDTRY.
ENDIF.
IF ( create = abap_true ).
TRY.
" Create text index via ADBC
cl_sql_connection=>get_connection(
)->create_statement(
)->execute_ddl( lv_sql ).
WRITE: / |Fulltext index { lv_idx } created|.
CATCH cx_sql_exception INTO DATA(lo_ex1).
" Error handling
WRITE: / | Error: { lo_ex1->get_text( ) }|.
ENDTRY.
ENDIF.
Listing 10.1Creating a Text Index via ADBC
You can view existing full text indexes when you open a table and click on the Indexes tab in SAP HANA Studio (see Figure 10.5). Here you can view technical characteristics such as the synchronization behavior.