The structure and purpose of SQL scalar functions are similar to C functions or traditional mathematical functions. The caller provides a series of function parameters and the function computes and returns a value. Sometimes these functions are purely functional (in the mathematical sense), in that they compute the result based purely off the parameters with no outside influences. In other cases, the functions are more procedural in nature, and are called to invoke specific side effects.
The body of a function can do pretty much anything you want, including calling out to other libraries. For example, you could write a function that allows SQLite to send email or query the status of a web server all through SQL functions. Your code can also interact with the database and run its own queries.
Although scalar functions can take multiple parameters, they can only return a single value, such as an integer or a string. Functions cannot return rows (a series of values), nor can they return a result set, with rows and columns.
Scalar functions can still be used to process sets of data, however. Consider this SQL statement:
SELECT format( name ) FROM employees;
In this query, the scalar function
format()
is applied to every row in the
result set. This is done by calling the scalar function over and over for
each row, as each row is computed. Even though the format()
function is only referenced once in this SQL
statement, when the query is executed, it can result in many different
invocations of the function, allowing it to process each value from the name
column.
To create a custom SQL function, you must bind an SQL function name to a C function pointer. The C function acts as a callback. Any time the SQL engine needs to invoke the named SQL function, the registered C function pointer is called. This provides a way for an SQL statement to call a C function you have written.
These functions allow you to create and bind an SQL function name to a C function pointer:
int sqlite3_create_function( sqlite3 *db, const char *func_name,
int num_param, int text_rep, void *udp,
func_ptr, step_func, final_func )
int sqlite3_create_function16( sqlite3 *db, const void *func_name,
int num_param, int text_rep, void *udp,
func_ptr, step_func, final_func )
Creates a new SQL function within a database
connection. The first parameter is the database
connection. The second parameter is the name of the
function as either a UTF-8 or UTF-16 encoded string.
The third parameter is the number of expected
parameters to the SQL function. If this value is
negative, the number of expected parameters is
variable or undefined. Fourth is the expected
representation for text values passed into the
function, and can be one of SQLITE_UTF8
, SQLITE_UTF16
, SQLITE_UTF16BE
, SQLITE_UTF16LE
, or
SQLITE_ANY
.
This is followed by a user-data pointer.
The last three parameters are all function pointers. We will look at the specific prototypes for these function pointers later. To register and create a scalar function, only the first function pointer is used. The other two function pointers are used to register aggregate functions and should be set to NULL when defining a scalar function.
SQLite allows SQL function names to be overloaded based off both the number of parameters and the text representation. This allows multiple C functions to be associated with the same SQL function name. You can use this overloading capability to register different C implementations of the same SQL function. This might be useful to efficiently handle different text encodings, or to provide different behaviors, depending on the number of parameters.
You are not required to register multiple text
encodings. When the SQLite library needs to make a function call, it
will attempt to find a registered function with a matching text
representation. If it cannot find an exact match, it will convert any
text values and call one of the other available functions. The value
SQLITE_ANY
indicates that the
function is willing to accept text values in any possible
encoding.
You can update or redefine a function by simply
reregistering it with a different function pointer. To delete a
function, call sqlite3_create_function_xxx()
with the same name,
parameter count, and text representation, but pass in NULL for all of
the function pointers. Unfortunately, there is no way to find out if a
function name is registered or not, outside of keeping track yourself.
That means there is no way to tell the difference between a create
action and a redefine action.
It is permissible to create a new function at any
time. There are limits on when you can change or delete a function,
however. If the database connection has any prepared statements that
are currently being executed (sqlite3_step()
has been called at least once, but
sqlite3_reset()
has not), you
cannot redefine or delete a custom function, you can only create a new
one. Any attempt to redefine or delete a function will return SQLITE_BUSY
.
If there are no statements currently being
executed, you may redefine or delete a custom function, but doing so
invalidates all the currently prepared statements (just as any schema
change does). If the statements were prepared with sqlite3_prepare_v2()
, they will
automatically reprepare themselves next time they’re used. If they
were prepared with an original version of sqlite3_prepare()
, any use of the statement will
return an SQLITE_SCHEMA
error.
The actual C function you need to write looks like this:
void custom_scalar_function( sqlite3_context *ctx,
int num_values, sqlite3_value **values )
This is the prototype of the C
function used to implement a custom scalar SQL
function. The first parameter is an sqlite3_context
structure, which can be used to access the user-data
pointer as well as set the function result. The
second parameter is the number of parameter
values present in the third parameter. The third
parameter is an array of sqlite3_value
pointers.
The second and third parameters
(int num_values,
sqlite3_value **values
) work together in
a very similar fashion to the traditional C main
parameters (int argc, char
**argv
).
In a threaded application, it may be possible for different threads to call into your function at the same time. As such, user-defined functions should be thread-safe.
Most user-defined functions follow a pretty
standard pattern. First, you’ll want to examine the sqlite3_value
parameters to verify
their types and extract their values. You can also extract the
user-data pointer passed into sqlite3_create_function_xxx()
. Your code can then
perform whatever calculation or procedure is required. Finally, you
can set the return value of the function or return an error condition.
SQL function parameters are passed into your C function as an array of
sqlite3_
value
structures. Each of these structures holds one parameter value.
To extract working C values from the sqlite3_value
structures, you need to
call one of the sqlite3_value_xxx()
functions. These functions are extremely similar to the sqlite3_column_xxx()
functions in use
and design. The only major difference is that these functions take a
single sqlite3_value
pointer,
rather than a prepared statement and a column index.
Like their column counterparts, the value
functions will attempt to automatically convert the value into
whatever datatype is requested. The conversion process and rules are
the same as those used by the sqlite3_column_xxx()
functions. See Table 7-1 for more details.
const void* sqlite3_value_blob( sqlite3_value *value )
double sqlite3_value_double( sqlite3_value *value )
Extracts and returns a double-precision floating point value.
int sqlite3_value_int( sqlite3_value *value )
Extracts and returns a 32-bit signed integer value. The returned value will be clipped (without warning) if the parameter value contains an integer value that cannot be represented with only 32 bits.
sqlite3_int64 sqlite3_value_int64( sqlite3_value *value )
const unsigned char* sqlite3_value_text( sqlite3_value *value )
Extracts and returns a UTF-8
encoded text value. The value will
always be null-terminated. Note that the returned
char
pointer is
unsigned and will likely require a cast. The pointer
may also be NULL if a type conversion was
required.
const void* sqlite3_value_text16( sqlite3_value *value )
const void* sqlite3_value_text16be( sqlite3_value *value )
const void* sqlite3_value_text16le( sqlite3_value *value )
Extracts and returns a UTF-16 encoded string. The first function returns a string in the native byte ordering of the machine. The other two functions will return a string that is always encoded in big-endian or little-endian. The value will always be null-terminated. The pointer may also be NULL if a type conversion was required.
There are also a number of helper functions to query the native datatype of a value, as well as query the size of any returned buffers.
int sqlite3_value_type( sqlite3_value *value )
Returns the native datatype of the value. The return value
can be one of SQLITE_
BLOB
, SQLITE_INTEGER
, SQLITE_FLOAT
, SQLITE_TEXT
, or SQLITE_NULL
. This value
can change or become invalid if a type conversion
takes place.
int sqlite3_value_numeric_type( sqlite3_value *value )
This function attempts to
convert a value into a numeric type
(either SQLITE_
FLOAT
or SQLITE_INTEGER
). If the
conversion can be done without loss of data, then
the conversion is made and the datatype of the new
value is returned. If a conversion cannot be done,
the value will not be converted and the original
datatype of
the value will be returned. This can be any value
that is returned by sqlite3_value_type()
.
The main difference between this
function and simply calling sqlite3_value_double()
or sqlite3_value_int()
, is
that the conversion will only take place if it is
meaningful and will not result in lost data. For
example, sqlite3_value_double()
will convert a
NULL into the value 0.0, while this function will
not. Similarly, sqlite3_value_int()
will convert the
first part of the string '123xyz'
into the integer 123, ignoring
the trailing 'xyz'
. This function will not, however,
because no sense can be made of the trailing
'xyz'
in a
numeric context.
int sqlite3_value_bytes( sqlite3_value *value )
Returns the number of bytes in a BLOB or in a UTF-8 encoded string. If returning the size of a text value, the size will include the null-terminator.
int sqlite3_value_bytes16( sqlite3_value *value )
Returns the number of bytes in a UTF-16 encoded string, including the null-terminator.
As with the sqlite3_column_xxx()
functions, any returned pointers
can become invalid if another sqlite3_value_xxx()
call is made against the same
sqlite3_value
structure.
Similarly, data conversions can
take place on text datatypes when calling sqlite3_
value_bytes()
or
sqlite3_value_bytes16()
. In
general, you should follow the same rules and practices as you would
with the sqlite3_column_xxx()
functions. See Result Columns for more
details.
In addition to the SQL function parameters, the
sqlite3_context
parameter
also carries useful information. These functions can be used to
extract either the database connection or the user-data pointer that
was used to create the function.
void* sqlite3_user_data( sqlite3_context *ctx )
Extracts the user-data pointer that was passed into
sqlite3_create_function_
xxx()
when the function was registered.
Be aware that this pointer is shared across all
invocations of this function within this database
connection.
sqlite3* sqlite3_context_db_handle( sqlite3_context *ctx )
Returns the database connection that was used to register this function.
The database connection returned by sqlite3_context_db_handle()
can be
used by the function to run queries or otherwise interact with the
database.
Once a function has extracted and verified its parameters, it can set about
its work. When a result has been computed, that result needs to be
passed back to the SQLite engine. This is done by using one of the
sqlite3_result_xxx()
functions. These functions set a result value in the function’s
sqlite3_context
structure.
Setting a result value is the only way your
function can communicate back to the SQLite engine about the success
or failure of the function call. The C function itself has a void
return type, so any result or
error has to be passed back through the context structure. Normally,
one of the sqlite3_result_xxx()
functions is called just prior to calling return
within your C function, but it is permissible
to set a new result multiple times throughout the function. Only the
last result will be returned, however.
The sqlite3_result_xxx()
functions are extremely similar
to the sqlite3_bind_xxx()
functions
in use and design. The main difference is that these functions take an
sqlite3_context
structure,
rather than a prepared statement and parameter index. A function can
only return one result, so any call to an sqlite3_result_xxx()
function will override the value
set by a previous call.
void sqlite3_result_blob( sqlite3_context* ctx,
const void *data, int data_len, mem_callback )
void sqlite3_result_double( sqlite3_context *ctx, double data )
void sqlite3_result_int( sqlite3_context *ctx, int data )
void sqlite3_result_int64( sqlite3_context *ctx, sqlite3_int64 data )
void sqlite3_result_null( sqlite3_context *ctx )
void sqlite3_result_text( sqlite3_context *ctx,
const char *data, int data_len, mem_callback )
void sqlite3_result_text16( sqlite3_context *ctx,
const void *data, int data_len, mem_callback )
void sqlite3_result_text16be( sqlite3_context *ctx,
const void *data, int data_len, mem_callback )
void sqlite3_result_text16le( sqlite3_context *ctx,
const void *data, int data_len, mem_callback )
Encodes a UTF-16 encoded string as a result. The first function is used for a string in the native byte format, while the last two functions are used for strings that are explicitly encoded as big-endian or little-endian, respectively.
void sqlite3_result_zeroblob( sqlite3_context *ctx, int length )
Encodes a BLOB as a result. The
BLOB will contain the number of bytes specified, and
each byte will all be set to zero (0x00
).
void sqlite3_result_value( sqlite3_context *ctx, sqlite3_value *result_value )
Encodes an sqlite3_value
as a
result. A copy of the value is made, so there is no
need to worry about keeping the sqlite3_value
parameter
stable between this call and when your function
actually exits.
This function accepts both
protected and unprotected value objects. You can
pass one of the sqlite3_value
parameters to this
function if you wish to return one of the SQL
function input parameters. You can also pass a value
obtained from a call to sqlite3_column_value()
.
Setting a BLOB or text value requires the same
type of memory management as the equivalent sqlite3_bind_xxx()
functions. The last parameter of
these functions is a callback pointer that will properly free and
release the given data buffer. You can pass a reference to sqlite3_free()
directly (assuming the
data buffers were allocated with sqlite3_malloc()
), or you can write your own memory
manager (or wrapper). You can also pass in one of the SQLITE_TRANSIENT
or SQLITE_STATIC
flags. See Binding Values for specifics on how these flags can
be used.
In addition to encoding specific datatypes, you
can also return an error status. This can be used to indicate a usage
problem (such as an incorrect number of parameters) or an environment
problem, such as running out of memory. Returning an error code will
result in SQLite aborting the current SQL statement and returning the
error back to the application via the return code of sqlite3_step()
or one of the
convenience functions.
void sqlite3_result_error( sqlite3_context *ctx,
const char *msg, int msg_size )
void sqlite3_result_error16( sqlite3_context *ctx,
const void *msg, int msg_size )
Sets the error code to SQLITE_ERROR
and sets the error message
to the provided UTF-8 or UTF-16 encoded string. An
internal copy of the string is made, so the
application can free or modify the string as soon as
this function returns. The last parameter indicates
the size of the message in bytes. If the string is
null-terminated and the last parameter is negative,
the string size is automatically computed.
void sqlite3_result_error_toobig( sqlite3_context *ctx )
Indicates the function could not process a text or BLOB value due to its size.
void sqlite3_result_error_nomem( sqlite3_context *ctx )
Indicates the function could not complete because it was unable to allocate required memory. This specialized function is designed to operate without allocating any additional memory. If you encounter a memory allocation error, simply call this function and have your function return.
void sqlite3_result_error_code( sqlite3_context *ctx, int code )
Sets a specific SQLite error code. Does not set or modify the error message.
It is possible to return both a custom error
message and a specific error code. First, call sqlite3_result_error()
(or sqlite3_result_error16()
) to set the
error message. This will also set the error code to SQLITE_ERROR
. If you want a different
error code, you can call sqlite3_result_error_code()
to override the generic
error code with something more specific,
leaving the error message untouched. Just be aware that sqlite3_
result_error()
will always set the error code to SQLITE_ERROR
, so you must set the error message
before you set a specific error code.
Here is a simple example that exposes the SQLite C API function
sqlite3_limit()
to the SQL
environment as the SQL function sql_limit()
. This function is used to adjust various
limits associated with the database connection, such as the maximum
number of columns in a result set or the maximum size of a BLOB
value.
Here’s a quick introduction to the C function
sqlite3_limit()
, which can be
used to adjust the soft limits of the SQLite environment:
int sqlite3_limit( sqlite3 *db, int limit_type, int limit_value )
For the given database connection, this sets the limit referenced by the second parameter to the value provided in the third parameter. The old limit is returned. If the new value is negative, the limit value will remain unchanged. This can be used to probe an existing limit. The soft limit cannot be raised above the hard limit, which is set at compile time.
For more specific details on sqlite3_limit()
, see sqlite3_limit() in Appendix G. You
don’t need a full understanding of how this API call works to
understand these examples.
Although the sqlite3_limit()
function makes a good example, it might not be the kind of thing you’d
want to expose to the SQL language in a real-world application. In
practice, exposing this C API call to the SQL level brings up some
security concerns. Anyone that can issue arbitrary SQL calls would
have the capability of altering the SQLite soft limits. This could be
used for some types of denial-of-service attacks by raising or
lowering the limits to their extremes.
In order to call the sqlite3_limit()
function, we
need to determine the limit_type
and value
parameters. This will require an SQL
function that takes two parameters. The first parameter will be
the limit type, expressed as a text constant. The second
parameter will be the new limit. The SQL function can be called
like this to set a new expression-depth limit:
SELECT sql_limit( 'EXPR_DEPTH', 400 );
The C function that implements the SQL
function sql_limit()
has four
main parts. The first task is to verify that the first SQL
function parameter (passed in as values[0]
) is a text value. If it is, the
function extracts the text to the str
pointer:
static void sql_set_limit( sqlite3_context *ctx, int num_values, sqlite3_value **values ) { sqlite3 *db = sqlite3_context_db_handle( ctx ); const char *str = NULL; int limit = -1, val = -1, result = -1; /* verify the first param is a string and extract pointer */ if ( sqlite3_value_type( values[0] ) == SQLITE_TEXT ) { str = (const char*) sqlite3_value_text( values[0] ); } else { sqlite3_result_error( ctx, "sql_limit(): wrong parameter type", -1 ); return; }
Next, the function verifies that the second
SQL parameter (values[1]
) is
an integer value, and extracts it into the val
variable:
/* verify the second parameter is an integer and extract value */ if ( sqlite3_value_type( values[1] ) == SQLITE_INTEGER ) { val = sqlite3_value_int( values[1] ); } else { sqlite3_result_error( ctx, "sql_limit(): wrong parameter type", -1 ); return; }
Although our SQL function uses a text value
to indicate which limit we would like to modify, the C function
sqlite3_limit()
requires a predefined integer value. We need to decode the
str
text value into an
integer limit
value. I’ll
show the code to decode_limit_str()
in just a bit:
/* translate string into integer limit */ limit = decode_limit_str( str ); if ( limit == -1 ) { sqlite3_result_error( ctx, "sql_limit(): unknown limit type", -1 ); return; }
After verifying our two SQL function
parameters, extracting their values, and translating the text limit indicator
into a proper integer value, we finally call sqlite3_
limit()
. The result is set as the result value
of the SQL function and the function returns:
/* call sqlite3_limit(), return result */ result = sqlite3_limit( db, limit, val ); sqlite3_result_int( ctx, result ); return; }
The decode_limit_str()
function is very simple, and
simply looks for a predefined set of text values:
int decode_limit_str( const char *str ) { if ( str == NULL ) return -1; if ( !strcmp( str, "LENGTH" ) ) return SQLITE_LIMIT_LENGTH; if ( !strcmp( str, "SQL_LENGTH" ) ) return SQLITE_LIMIT_SQL_LENGTH; if ( !strcmp( str, "COLUMN" ) ) return SQLITE_LIMIT_COLUMN; if ( !strcmp( str, "EXPR_DEPTH" ) ) return SQLITE_LIMIT_EXPR_DEPTH; if ( !strcmp( str, "COMPOUND_SELECT" ) ) return SQLITE_LIMIT_COMPOUND_SELECT; if ( !strcmp( str, "VDBE_OP" ) ) return SQLITE_LIMIT_VDBE_OP; if ( !strcmp( str, "FUNCTION_ARG" ) ) return SQLITE_LIMIT_FUNCTION_ARG; if ( !strcmp( str, "ATTACHED" ) ) return SQLITE_LIMIT_ATTACHED; if ( !strcmp( str, "LIKE_LENGTH" ) ) return SQLITE_LIMIT_LIKE_PATTERN_LENGTH; if ( !strcmp( str, "VARIABLE_NUMBER" ) ) return SQLITE_LIMIT_VARIABLE_NUMBER; if ( !strcmp( str, "TRIGGER_DEPTH" ) ) return SQLITE_LIMIT_TRIGGER_DEPTH; return -1; }
With these two functions in place, we can
create the sql_limit()
SQL
function by registering the sql_set_limit()
C function pointer.
sqlite3_create_function( db, "sql_limit", 2, SQLITE_UTF8, NULL, sql_set_limit, NULL, NULL );
The parameters for this function include the
database connection (db
), the
name of the SQL function (sql_limit
), the required number of parameters
(2
), the expected text
encoding (UTF-8), the user-data pointer (NULL), and finally the
C function pointer that implements this function (sql_set_limit
). The last two
parameters are only used when creating aggregate functions, and
are set to NULL.
Once the SQL function has been created, we
can now manipulate the limits of our SQLite environment by
issuing SQL commands. Here are some examples of what the
sql_limit()
SQL
function might look like if we integrated it into the sqlite3
tool (we’ll see how to
do this using a loadable extension later in the chapter).
First, we can look up the current COLUMN
limit by passing a new
limit value of -1
:
sqlite> SELECT sql_limit( 'COLUMN', -1 );
2000
We verify the function works correctly by setting the maximum column limit to two, and then generating a result with three columns. The previous limit value is returned when we set the new value:
sqlite>SELECT sql_limit( 'COLUMN', 2 );
2000 sqlite>SELECT 1, 2, 3;
Error: too many columns in result set
We see from the error that the soft limit is correctly set, meaning our function is working.
One thing you might be wondering about is
parameter value count. Although the sql_set_limit()
function carefully checks the
types of the parameters, it doesn’t actually verify that
num_values
is equal to
two. In this case, it doesn’t have to, since it was registered
with sqlite3_create_function()
with a required
parameter count of two. SQLite will not even call our sql_set_limit()
function unless
we have exactly two parameters:
sqlite> SELECT sql_limit( 'COLUMN', 2000, 'extra' );
Error: wrong number of arguments to function sql_limit()
SQLite sees the wrong number of parameters and generates an error for us. This means that as long as a function is registered correctly, SQLite will do some of our value checking for us.
While having a fixed parameter count
simplifies the verification code, it might be useful to provide
a single-parameter version that can be used to look up the
current value. This can be done a few different ways. First, we
can define a second C function called sql_get_limit()
. This function would be the
same as sql_set_limit()
, but
with the second block of code removed:
/* remove this block of code from a copy of */ /* sql_set_limit() to produce sql_get_limit() */ if ( sqlite3_value_type( values[1] ) == SQLITE_INTEGER ) { val = sqlite3_value_int( values[1] ); } else { sqlite3_result_error( ctx, "sql_limit(): wrong parameter type", -1 ); return; }
With this code removed, the function will
never decode the second SQL function parameter. Since val
is initialized to –1, this
effectively makes every call a query call. We register each of
these functions separately:
sqlite3_create_function( db, "sql_limit", 1, SQLITE_UTF8, NULL, sql_get_limit, NULL, NULL ); sqlite3_create_function( db, "sql_limit", 2, SQLITE_UTF8, NULL, sql_set_limit, NULL, NULL );
This dual registration overloads the SQL
function name sql_limit()
.
Overloading is allowed because the two calls to sqlite3_create_function()
have a
different number of required parameters. If the SQL function
sql_limit()
is called
with one parameter, then the C function sql_get_limit()
is called. If two parameters
are provided to the SQL function, then the C function sql_set_limit()
is
called.
Although the two C functions sql_get_limit()
and sql_set_limit()
provide the
correct functionality, the majority of their code is the same.
Rather than having two functions, it might be simpler to combine
these two functions into one function that can deal with either
one or two parameters, and is capable of both getting or setting
a limit value.
This combine sql_getset_limit()
function can be created by
taking the original sql_set_limit()
function and modifying the
second section. Rather than eliminating it, as we did to create
sql_get_limit()
, we’ll
simply wrap it in an if
statement, so the second section (which extracts the second SQL
function parameter) is only run if we have two
parameters:
/* verify the second parameter is an integer and extract value */ if ( num_values == 2 ) { if ( sqlite3_value_type( values[1] ) == SQLITE_INTEGER ) { val = sqlite3_value_int( values[1] ); } else { sqlite3_result_error( ctx, "sql_limit(): wrong parameter type", -1 ); return; } }
We register the same sql_getset_limit()
C function
under both parameter counts:
sqlite3_create_function( db, "sql_limit", 1, SQLITE_UTF8, NULL, sql_getset_limit, NULL, NULL ); sqlite3_create_function( db, "sql_limit", 2, SQLITE_UTF8, NULL, sql_getset_limit, NULL, NULL );
For this specific task, this is likely the
best choice. SQLite will verify the SQL function sql_limit()
has exactly one or
two parameters before calling our C function, which can easily
deal with either one of those two cases.
If for some reason you don’t like the idea
of registering the same function twice, we could also have
SQLite ignore the parameter count and call our function no
matter what. This leaves verification of a valid parameter count
up to us. To do that, we’d start with the sql_getset_limit()
function and
change it to sql_getset_var_limit()
, by adding this block at
the top of the function:
if ( ( num_values < 1 )||( num_values > 2 ) ) { sqlite3_result_error( ctx, "sql_limit(): bad parameter count", -1 ); return; }
We register just one version. By passing a
required parameter count of -1
, we’re telling the SQLite engine that we’re
willing to accept any number of parameters:
sqlite3_create_function( db, "sql_limit", -1, SQLITE_UTF8, NULL, sql_getset_var_limit, NULL, NULL );
Although this works, the sql_getset_limit()
version is
still my preferred version. The registration makes it clear
which versions of the function are considered valid, and the
function code is reasonably clear and compact.
Completely free-form parameter counts are
usually used by items like the built-in
function
coalesce()
. The coalesce()
function will take
any number of parameters (greater than one) and return the first
non-NULL value in the list. Since you might pass anywhere from
two to a dozen or more parameters, it is impractical to register
each possible configuration, and is better to just allow the
function to do its own parameter management.
On the other hand, something like sql_getset_limit()
can really
only accept two configurations: one parameter or two. In that
case, I find it easier to explicitly register both parameter
counts and allow SQLite to do my parameter verification
for me.