Scalar Functions

The structure and purpose of SQL scalar functions are similar to C functions or traditional mathematical functions. The caller provides a series of function parameters and the function computes and returns a value. Sometimes these functions are purely functional (in the mathematical sense), in that they compute the result based purely off the parameters with no outside influences. In other cases, the functions are more procedural in nature, and are called to invoke specific side effects.

The body of a function can do pretty much anything you want, including calling out to other libraries. For example, you could write a function that allows SQLite to send email or query the status of a web server all through SQL functions. Your code can also interact with the database and run its own queries.

Although scalar functions can take multiple parameters, they can only return a single value, such as an integer or a string. Functions cannot return rows (a series of values), nor can they return a result set, with rows and columns.

Scalar functions can still be used to process sets of data, however. Consider this SQL statement:

SELECT format( name ) FROM employees;

In this query, the scalar function format() is applied to every row in the result set. This is done by calling the scalar function over and over for each row, as each row is computed. Even though the format() function is only referenced once in this SQL statement, when the query is executed, it can result in many different invocations of the function, allowing it to process each value from the name column.

To create a custom SQL function, you must bind an SQL function name to a C function pointer. The C function acts as a callback. Any time the SQL engine needs to invoke the named SQL function, the registered C function pointer is called. This provides a way for an SQL statement to call a C function you have written.

These functions allow you to create and bind an SQL function name to a C function pointer:

SQLite allows SQL function names to be overloaded based off both the number of parameters and the text representation. This allows multiple C functions to be associated with the same SQL function name. You can use this overloading capability to register different C implementations of the same SQL function. This might be useful to efficiently handle different text encodings, or to provide different behaviors, depending on the number of parameters.

You are not required to register multiple text encodings. When the SQLite library needs to make a function call, it will attempt to find a registered function with a matching text representation. If it cannot find an exact match, it will convert any text values and call one of the other available functions. The value SQLITE_ANY indicates that the function is willing to accept text values in any possible encoding.

You can update or redefine a function by simply reregistering it with a different function pointer. To delete a function, call sqlite3_create_function_xxx() with the same name, parameter count, and text representation, but pass in NULL for all of the function pointers. Unfortunately, there is no way to find out if a function name is registered or not, outside of keeping track yourself. That means there is no way to tell the difference between a create action and a redefine action.

It is permissible to create a new function at any time. There are limits on when you can change or delete a function, however. If the database connection has any prepared statements that are currently being executed (sqlite3_step() has been called at least once, but sqlite3_reset() has not), you cannot redefine or delete a custom function, you can only create a new one. Any attempt to redefine or delete a function will return SQLITE_BUSY.

If there are no statements currently being executed, you may redefine or delete a custom function, but doing so invalidates all the currently prepared statements (just as any schema change does). If the statements were prepared with sqlite3_prepare_v2(), they will automatically reprepare themselves next time they’re used. If they were prepared with an original version of sqlite3_prepare(), any use of the statement will return an SQLITE_SCHEMA error.

The actual C function you need to write looks like this:

In a threaded application, it may be possible for different threads to call into your function at the same time. As such, user-defined functions should be thread-safe.

Most user-defined functions follow a pretty standard pattern. First, you’ll want to examine the sqlite3_value parameters to verify their types and extract their values. You can also extract the user-data pointer passed into sqlite3_create_function_xxx(). Your code can then perform whatever calculation or procedure is required. Finally, you can set the return value of the function or return an error condition.

SQL function parameters are passed into your C function as an array of sqlite3_value structures. Each of these structures holds one parameter value.

To extract working C values from the sqlite3_value structures, you need to call one of the sqlite3_value_xxx() functions. These functions are extremely similar to the sqlite3_column_xxx() functions in use and design. The only major difference is that these functions take a single sqlite3_value pointer, rather than a prepared statement and a column index.

Like their column counterparts, the value functions will attempt to automatically convert the value into whatever datatype is requested. The conversion process and rules are the same as those used by the sqlite3_column_xxx() functions. See Table 7-1 for more details.

There are also a number of helper functions to query the native datatype of a value, as well as query the size of any returned buffers.

int sqlite3_value_type( sqlite3_value *value )

Returns the native datatype of the value. The return value can be one of SQLITE_BLOB, SQLITE_INTEGER, SQLITE_FLOAT, SQLITE_TEXT, or SQLITE_NULL. This value can change or become invalid if a type conversion takes place.

int sqlite3_value_numeric_type( sqlite3_value *value )

This function attempts to convert a value into a numeric type (either SQLITE_FLOAT or SQLITE_INTEGER). If the conversion can be done without loss of data, then the conversion is made and the datatype of the new value is returned. If a conversion cannot be done, the value will not be converted and the original datatype of the value will be returned. This can be any value that is returned by sqlite3_value_type().

The main difference between this function and simply calling sqlite3_value_double() or sqlite3_value_int(), is that the conversion will only take place if it is meaningful and will not result in lost data. For example, sqlite3_value_double() will convert a NULL into the value 0.0, while this function will not. Similarly, sqlite3_value_int() will convert the first part of the string '123xyz' into the integer 123, ignoring the trailing 'xyz'. This function will not, however, because no sense can be made of the trailing 'xyz' in a numeric context.

int sqlite3_value_bytes( sqlite3_value *value )

Returns the number of bytes in a BLOB or in a UTF-8 encoded string. If returning the size of a text value, the size will include the null-terminator.

int sqlite3_value_bytes16( sqlite3_value *value )

Returns the number of bytes in a UTF-16 encoded string, including the null-terminator.

As with the sqlite3_column_xxx() functions, any returned pointers can become invalid if another sqlite3_value_xxx() call is made against the same sqlite3_value structure. Similarly, data conversions can take place on text datatypes when calling sqlite3_value_bytes() or sqlite3_value_bytes16(). In general, you should follow the same rules and practices as you would with the sqlite3_column_xxx() functions. See Result Columns for more details.

In addition to the SQL function parameters, the sqlite3_context parameter also carries useful information. These functions can be used to extract either the database connection or the user-data pointer that was used to create the function.

The database connection returned by sqlite3_context_db_handle() can be used by the function to run queries or otherwise interact with the database.

Once a function has extracted and verified its parameters, it can set about its work. When a result has been computed, that result needs to be passed back to the SQLite engine. This is done by using one of the sqlite3_result_xxx() functions. These functions set a result value in the function’s sqlite3_context structure.

Setting a result value is the only way your function can communicate back to the SQLite engine about the success or failure of the function call. The C function itself has a void return type, so any result or error has to be passed back through the context structure. Normally, one of the sqlite3_result_xxx() functions is called just prior to calling return within your C function, but it is permissible to set a new result multiple times throughout the function. Only the last result will be returned, however.

The sqlite3_result_xxx() functions are extremely similar to the sqlite3_bind_xxx() functions in use and design. The main difference is that these functions take an sqlite3_context structure, rather than a prepared statement and parameter index. A function can only return one result, so any call to an sqlite3_result_xxx() function will override the value set by a previous call.

void sqlite3_result_blob( sqlite3_context* ctx, const void *data, int data_len, mem_callback )

Encodes a data buffer as a BLOB result.

void sqlite3_result_double( sqlite3_context *ctx, double data )

Encodes a 64-bit floating-point value as a result.

void sqlite3_result_int( sqlite3_context *ctx, int data )

Encodes a 32-bit signed integer as a result.

void sqlite3_result_int64( sqlite3_context *ctx, sqlite3_int64 data )

Encodes a 64-bit signed integer as a result.

void sqlite3_result_null( sqlite3_context *ctx )

Encodes an SQL NULL as a result.

void sqlite3_result_text( sqlite3_context *ctx, const char *data, int data_len, mem_callback )

Encodes a UTF-8 encoded string as a result.

void sqlite3_result_text16( sqlite3_context *ctx, const void *data, int data_len, mem_callback )
void sqlite3_result_text16be( sqlite3_context *ctx, const void *data, int data_len, mem_callback )
void sqlite3_result_text16le( sqlite3_context *ctx, const void *data, int data_len, mem_callback )

Encodes a UTF-16 encoded string as a result. The first function is used for a string in the native byte format, while the last two functions are used for strings that are explicitly encoded as big-endian or little-endian, respectively.

void sqlite3_result_zeroblob( sqlite3_context *ctx, int length )

Encodes a BLOB as a result. The BLOB will contain the number of bytes specified, and each byte will all be set to zero (0x00).

void sqlite3_result_value( sqlite3_context *ctx, sqlite3_value *result_value )

Encodes an sqlite3_value as a result. A copy of the value is made, so there is no need to worry about keeping the sqlite3_value parameter stable between this call and when your function actually exits.

This function accepts both protected and unprotected value objects. You can pass one of the sqlite3_value parameters to this function if you wish to return one of the SQL function input parameters. You can also pass a value obtained from a call to sqlite3_column_value().

Setting a BLOB or text value requires the same type of memory management as the equivalent sqlite3_bind_xxx() functions. The last parameter of these functions is a callback pointer that will properly free and release the given data buffer. You can pass a reference to sqlite3_free() directly (assuming the data buffers were allocated with sqlite3_malloc()), or you can write your own memory manager (or wrapper). You can also pass in one of the SQLITE_TRANSIENT or SQLITE_STATIC flags. See Binding Values for specifics on how these flags can be used.

In addition to encoding specific datatypes, you can also return an error status. This can be used to indicate a usage problem (such as an incorrect number of parameters) or an environment problem, such as running out of memory. Returning an error code will result in SQLite aborting the current SQL statement and returning the error back to the application via the return code of sqlite3_step() or one of the convenience functions.

It is possible to return both a custom error message and a specific error code. First, call sqlite3_result_error() (or sqlite3_result_error16()) to set the error message. This will also set the error code to SQLITE_ERROR. If you want a different error code, you can call sqlite3_result_error_code() to override the generic error code with something more specific, leaving the error message untouched. Just be aware that sqlite3_result_error() will always set the error code to SQLITE_ERROR, so you must set the error message before you set a specific error code.

Here is a simple example that exposes the SQLite C API function sqlite3_limit() to the SQL environment as the SQL function sql_limit(). This function is used to adjust various limits associated with the database connection, such as the maximum number of columns in a result set or the maximum size of a BLOB value.

Here’s a quick introduction to the C function sqlite3_limit(), which can be used to adjust the soft limits of the SQLite environment:

For more specific details on sqlite3_limit(), see sqlite3_limit() in Appendix G. You don’t need a full understanding of how this API call works to understand these examples.

Although the sqlite3_limit() function makes a good example, it might not be the kind of thing you’d want to expose to the SQL language in a real-world application. In practice, exposing this C API call to the SQL level brings up some security concerns. Anyone that can issue arbitrary SQL calls would have the capability of altering the SQLite soft limits. This could be used for some types of denial-of-service attacks by raising or lowering the limits to their extremes.

In order to call the sqlite3_limit() function, we need to determine the limit_type and value parameters. This will require an SQL function that takes two parameters. The first parameter will be the limit type, expressed as a text constant. The second parameter will be the new limit. The SQL function can be called like this to set a new expression-depth limit:

SELECT sql_limit( 'EXPR_DEPTH', 400 );

The C function that implements the SQL function sql_limit() has four main parts. The first task is to verify that the first SQL function parameter (passed in as values[0]) is a text value. If it is, the function extracts the text to the str pointer:

static void sql_set_limit( sqlite3_context *ctx, int
                                                 num_values, sqlite3_value **values )
{
    sqlite3      *db = sqlite3_context_db_handle( ctx );
    const char   *str = NULL;
    int           limit = -1, val = -1, result = -1;

    /* verify the first param is a string and extract pointer */
    if ( sqlite3_value_type( values[0] ) == SQLITE_TEXT ) {
        str = (const char*) sqlite3_value_text( values[0] );
    } else {
        sqlite3_result_error( ctx, "sql_limit(): wrong parameter type", -1 );
        return;
    }

Next, the function verifies that the second SQL parameter (values[1]) is an integer value, and extracts it into the val variable:

    /* verify the second parameter is an integer and extract value */
    if ( sqlite3_value_type( values[1] ) == SQLITE_INTEGER ) {
        val = sqlite3_value_int( values[1] );
    } else {
        sqlite3_result_error( ctx, "sql_limit(): wrong parameter type", -1 );
        return;
    }

Although our SQL function uses a text value to indicate which limit we would like to modify, the C function sqlite3_limit() requires a predefined integer value. We need to decode the str text value into an integer limit value. I’ll show the code to decode_limit_str() in just a bit:

    /* translate string into integer limit */
    limit = decode_limit_str( str );
    if ( limit == -1 ) {
        sqlite3_result_error( ctx, "sql_limit(): unknown limit type", -1 );
        return;
    }

After verifying our two SQL function parameters, extracting their values, and translating the text limit indicator into a proper integer value, we finally call sqlite3_limit(). The result is set as the result value of the SQL function and the function returns:

    /* call sqlite3_limit(), return result */
    result = sqlite3_limit( db, limit, val );
    sqlite3_result_int( ctx, result );
    return;
}

The decode_limit_str() function is very simple, and simply looks for a predefined set of text values:

int decode_limit_str( const char *str )
{
    if ( str == NULL ) return -1;    
    if ( !strcmp( str, "LENGTH"          ) ) return SQLITE_LIMIT_LENGTH;
    if ( !strcmp( str, "SQL_LENGTH"      ) ) return SQLITE_LIMIT_SQL_LENGTH;
    if ( !strcmp( str, "COLUMN"          ) ) return SQLITE_LIMIT_COLUMN;
    if ( !strcmp( str, "EXPR_DEPTH"      ) ) return SQLITE_LIMIT_EXPR_DEPTH;
    if ( !strcmp( str, "COMPOUND_SELECT" ) ) return SQLITE_LIMIT_COMPOUND_SELECT;
    if ( !strcmp( str, "VDBE_OP"         ) ) return SQLITE_LIMIT_VDBE_OP;
    if ( !strcmp( str, "FUNCTION_ARG"    ) ) return SQLITE_LIMIT_FUNCTION_ARG;
    if ( !strcmp( str, "ATTACHED"        ) ) return SQLITE_LIMIT_ATTACHED;
    if ( !strcmp( str, "LIKE_LENGTH"     ) ) return SQLITE_LIMIT_LIKE_PATTERN_LENGTH;
    if ( !strcmp( str, "VARIABLE_NUMBER" ) ) return SQLITE_LIMIT_VARIABLE_NUMBER;
    if ( !strcmp( str, "TRIGGER_DEPTH"   ) ) return SQLITE_LIMIT_TRIGGER_DEPTH;
    return -1;
}

With these two functions in place, we can create the sql_limit() SQL function by registering the sql_set_limit() C function pointer.

sqlite3_create_function( db, "sql_limit", 2, SQLITE_UTF8, 
                         NULL, sql_set_limit, NULL, NULL );

The parameters for this function include the database connection (db), the name of the SQL function (sql_limit), the required number of parameters (2), the expected text encoding (UTF-8), the user-data pointer (NULL), and finally the C function pointer that implements this function (sql_set_limit). The last two parameters are only used when creating aggregate functions, and are set to NULL.

Once the SQL function has been created, we can now manipulate the limits of our SQLite environment by issuing SQL commands. Here are some examples of what the sql_limit() SQL function might look like if we integrated it into the sqlite3 tool (we’ll see how to do this using a loadable extension later in the chapter).

First, we can look up the current COLUMN limit by passing a new limit value of -1:

sqlite> SELECT sql_limit( 'COLUMN', -1 );
2000

We verify the function works correctly by setting the maximum column limit to two, and then generating a result with three columns. The previous limit value is returned when we set the new value:

sqlite> SELECT sql_limit( 'COLUMN', 2 );
2000
sqlite> SELECT 1, 2, 3;
Error: too many columns in result set

We see from the error that the soft limit is correctly set, meaning our function is working.

One thing you might be wondering about is parameter value count. Although the sql_set_limit() function carefully checks the types of the parameters, it doesn’t actually verify that num_values is equal to two. In this case, it doesn’t have to, since it was registered with sqlite3_create_function() with a required parameter count of two. SQLite will not even call our sql_set_limit() function unless we have exactly two parameters:

sqlite> SELECT sql_limit( 'COLUMN', 2000, 'extra' );
Error: wrong number of arguments to function sql_limit()

SQLite sees the wrong number of parameters and generates an error for us. This means that as long as a function is registered correctly, SQLite will do some of our value checking for us.

If for some reason you don’t like the idea of registering the same function twice, we could also have SQLite ignore the parameter count and call our function no matter what. This leaves verification of a valid parameter count up to us. To do that, we’d start with the sql_getset_limit() function and change it to sql_getset_var_limit(), by adding this block at the top of the function:

    if ( ( num_values < 1 )||( num_values > 2 ) ) {
        sqlite3_result_error( ctx, "sql_limit(): bad parameter count", -1 );
        return;
    }

We register just one version. By passing a required parameter count of -1, we’re telling the SQLite engine that we’re willing to accept any number of parameters:

sqlite3_create_function( db, "sql_limit", -1, SQLITE_UTF8,
        NULL, sql_getset_var_limit, NULL, NULL );

Although this works, the sql_getset_limit() version is still my preferred version. The registration makes it clear which versions of the function are considered valid, and the function code is reasonably clear and compact.

Completely free-form parameter counts are usually used by items like the built-in function coalesce(). The coalesce() function will take any number of parameters (greater than one) and return the first non-NULL value in the list. Since you might pass anywhere from two to a dozen or more parameters, it is impractical to register each possible configuration, and is better to just allow the function to do its own parameter management.

On the other hand, something like sql_getset_limit() can really only accept two configurations: one parameter or two. In that case, I find it easier to explicitly register both parameter counts and allow SQLite to do my parameter verification for me.