Functions

A function is an identifier that instructs PostgreSQL to perform a programmatic operation within a SQL statement. A function returns a single value from its operation, and that value is then used in the SQL statement where the function was invoked. This process is similar to the way operators return their results in the location from which they were called in the query. (In fact, operators are technically pointers to built-in system functions, and are sometimes called “syntactic sugar” for functions, as they are a syntactically convenient way to call underlying functions.)

To use a function in a SQL statement, type the function’s name, followed by its list of parameters (called arguments), if any. The arguments passed to a function are enclosed in parentheses. There are two general styles of entering arguments: the standard SQL92 functions are generally implemented so that they accept their arguments delimited by special SQL keywords, such as FROM, FOR, and USING. PostgreSQL-style functions, on the other hand, accept arguments delimited by commas (which you might expect if you have experience with a programming language such as C).

Arguments may be constants, valid identifiers, or expressions. The particular arguments you need to pass to a function will depend completely on the function being used, and its requirements: especially with regards to data types. With a couple of exceptions, all functions require the open and closing parentheses following the function name, even if no arguments are passed.

sql92_style_function ( { argument | KEYWORD } [...] )
pgsql_style_function ( argument [, ...] )

A powerful use of functions is that they may be nested, provided that the data type returned by a nested function is compatible with the argument accepted by the function it is nested within. Functions may be nested to any depth:

function_name ( nested_function_name ( arguments [, ...] ) [, ...] )

PostgreSQL defines a rich set of functions for its built-in data types. To view a complete list of functions available, execute the \df slash command within psql. PostgreSQL also supports extensibility of its function set through the CREATE FUNCTION command. See Chapter 7 for more on this topic.

The mathematical functions provided for PostgreSQL operate on a variety of numeric data types, and generally return a value of the same type as the function’s arguments. They can perform many useful and common arithmetic and trigonometric operations; Table 5-9 provides an overview of some of the most common mathematical functions in PostgreSQL.

The following sections elaborate on each of the functions described in Table 5-9, detailing required arguments, data types, and functionality. Note that while a function will usually only accept one of a set of data types as its arguments, PostgreSQL will attempt to implicitly convert supplied arguments to the required types, if necessary. If an implicit type conversion fails, PostgreSQL will supply the appropriate error message, and you may need to use an explicit type conversion. See Chapter 3 for more information on explicitly converting types.

PostgreSQL supports a wide variety of text formatting, analysis and comparison functions. These include both SQL92 standard functions, such as substring( ) and trim( ), as well as PostgreSQL-specific extensions, such as ltrim( ), rtrim( ) and substr( ). Table 5-10 lists the functions available to PostgreSQL for use with character strings. In general, when referring to a value of type text, it is functionally synonymous with a value of type character, or varchar.

Table 5-10. Character string functions

Function

Description

ascii(s)

Returns the ascii code of the first character passed to it in character string s

btrim(s [, t])

Returns character string s, trimmed on the left and right of any substrings consisting solely of letters in character string t (or whitespace, if t is not specified)

char_length(s)

Returns the numeric length of character string s

chr(n)

Returns the character whose ascii value corresponds to the number n

s ilike(f)

Returns true if the expression f is found to match (case-insensitively) s

initcap(s)

Returns the character string s, with each word’s first letter capitalized

length(s)

Returns the numeric length of character string s

s like(f)

Returns true if the expression f is found to match s

lower(s)

Returns the string s, in all lowercase

lpad(s, n [, c])

Returns the character string s, padded to the left with character string c (or whitespace, if c is not defined) to length of n characters (or truncated on the right to n characters)

ltrim(s [, f])

Returns character string s, trimmed on the left of a substring consisting solely of letters in character string f (or whitespace, if f is not specified)

octet_length(s)

Returns the number of 8-bit bytes in character string s

position(b IN s)

Returns the location of character sub-string b in character string s (counting from 1)

repeat(s, n)

Returns the character string s, repeated n times

rpad(s, n [, c])

Returns the character string s, padded to the right with character string c (or whitespace, if c is not specified) to length of n characters (or truncated on the left to n characters)

rtrim(s [, f])

Returns character string s, trimmed on the right of a substring consisting solely of letters in character string f (or whitespace, if f is not specified)

strpos(s, b)

Returns the location of character sub-string b in character string s (counting from 1). This is a PostgreSQL specific function which duplicates the effect of the SQL position( ) function, using C style arguments.

substr(s, n [, l])

Returns a character sub-string of the character string s, starting at digit n (counting from 1), with optional maximum length l characters

substring(s FROM n FOR l)

Returns a character sub-string of the character string s, starting at digit n (counting from 1), with optional maximum length l characters

to_ascii(s, f)

Returns text s converted from multibyte encoding format f to plain ASCII

translate(s, f, r)

Returns the character string s, with any found characters from string f replaced with corresponding character in string r

trim(side f FROM s)

Returns character string s, trimmed of leading and/or trailing substrings which consist solely of letters in character string f, as dictated by the side keyword (which is either LEADING, TRAILING or BOTH)

upper(s)

Returns the character string s, converted to all uppercase

The following sections describe each of these character string functions, detailing their argument requirements, return types, and general usage.

s like(f)
s LIKE f
like(s, f)
s ilike(f)
s ILIKE f

The like( ) function checks the expression described by f, and attempts to see if it matches the character string s. It may either accept two arguments of type text, s and f, or it may be used in a special SQL syntax format where the argument s precedes the function name, adding to the readability of the statement. The ilike( ) function is a non-standard, case-insensitive version of like( ), and may only be invoked through the SQL-style syntax.

The use of like( ) differs from a normal equivalence operation in that the character string f may contain either an underscore (_) or percent (%) symbol to indicate special meaning in matching character values. PostgreSQL interprets the _ symbol as indicating that any single character should be considered a match, while the % symbol is interpreted as indicating that zero or more characters of any value will be considered a match. These special characters may be interspersed throughout the character string f.

For more advanced pattern matching capabilities within PostgreSQL, see the section titled Regular expression matching operators earlier in this chapter.

translate(s, f, r)

The translate( ) function accepts three arguments, s, f and r, each of type text. It replaces any instance of a character in the string s that matches any character in f with the corresponding character at the same index from string r. The result is returned as a value of type text.

Note that this function does not replace only complete instances of the character string f, but replaces any character within s that matches any character in f with the corresponding character from r. If there are more characters in f than in r, any character in f without a corresponding character in r will simply be omitted (this can be a useful way to remove unwanted characters).

The important thing to remember about this method of replacement is that there is always a one-to-one relationship between the character found and its replacement character (though its replacement may be empty, if omitted).

The following examples replace all question marks with exclamation points.

The next example replaces all instances of the character i with the character w, and all instances of the character s with the character a. The extra s at the end of “was” is ignored.

booktown=# SELECT translate('This is a mistake.', 'is', 'was');
     translate
--------------------
 Thwa wa a mwatake.
(1 row)

This final example replaces all vowels with nothing, effectively removing all vowels from the input strings.

booktown=# SELECT title,
booktown-#        translate(title, 'aeiouAEIOU', '') AS vowelless
booktown-#        FROM books
booktown-#        LIMIT 5;
            title            |    vowelless
-----------------------------+--------------------
 The Shining                 | Th Shnng
 Dune                        | Dn
 2001: A Space Odyssey       | 2001:  Spc dyssy
 The Cat in the Hat          | Th Ct n th Ht
 Bartholomew and the Oobleck | Brthlmw nd th blck
(5 rows)

The standard SQL92 date and time functions (current_date, current_time, current_timestamp, and extract( )) are each supported by PostgreSQL, as well as a variety of PostgreSQL-specific extensions. Each of PostgreSQL’s date and time retrieval and extraction functions are listed in Table 5-11.

The following sections elaborate on each of PostgreSQL’s date and time functions described in Table 5-11. Note that the syntax for the current_date, current_time and current_timestamp functions omits the parentheses. This is done to remain compliant with the SQL92 standard requirements.

date_part(s, t)
date_part(s, i)

The date_part( ) function accepts two arguments, s of type text, and either t of type timestamp, or i of type interval. The function removes the part of the time length specified by s, and returns it as a value of type double precision.

To understand the function of date_part( ), it can be helpful to think of a timestamp or interval value as being broken up into several fields. These fields each describe a discrete component of the temporal value, such as the number of days, hours, or minutes described. The valid values for time field units described by s are detailed in Table 5-12. Notice that some values are only appropriate for use with a timestamp value, and not with an interval.

Table 5-12. Timestamp and interval units

Unit

Description

century

Describes the year field, divided by 100 (will not describe the literal century)

day

Describes the day field, from 1 to 31, for a timestamp, or the total number of days for an interval

decade

Describes the year field, divided by 10

dow

Describes the day of the week field, from 0 to 6 (beginning on Sunday), for a timestamp, not applicable to an interval

doy

Describes the day of the year field, from 1 to 365 or 366 for a timestamp value, not application to an interval

epoch

Describes the number of seconds since the epoch (Jan 1, 1970) for a timestamp, or total number of seconds for an interval

hour

Describes the hour represented by a timestamp

microseconds

Describes the millionths of seconds following the decimal in the seconds field of a timestamp value

millennium

Describes the year field, divided by 1000 (will not describe the literal millennium)

milliseconds

Describes the thousandths of seconds following the decimal in the seconds field of a timestamp value

minute

Describes the minutes field of a timestamp or interval value

month

Describes the month of the year for a timestamp value, or the number of months modulo 12 for interval values

quarter

Describes the quarter of the year, from 1 to 4, for timestamp values

second

Describes the seconds field of a timestamp or interval value

week

Describes the week of the year of a timestamp value. ISO-8601 defines the first week of the year to be the week containing January 4.

year

Describes the year field of a timestamp or interval value

While PostgreSQL is able to explicitly cast between most commonly used data types, some conversions require a function in order to meaningfully translate values. Some of PostgreSQL’s commonly used type conversion functions are listed in Table 5-13. These are detailed in the following sections.

to_char(n, f)

The to_char( ) function, when used with argument n of type numeric and argument f, of type text, formats the numeric value of n to a character string returned as type text. The character string f describes the character string format within which to place the value of n.

The f format string consists of a series of meta-characters, which PostgreSQL translates into the literal values they represent. Valid meta-characters that may be used within this format string for a numeric conversion are outlined in Table 5-14.

When more digits are specified with the 9 character in the format string than are within the numeric value n, the extra digits will be padded with whitespace. When more digits are specified with the 0 character, the extra digits will be padded with zeroes.

If fewer digits are specified then are necessary to represent the digits to the left of the decimal, the meaning of the conversion becomes ambiguous, as significant digits must be omitted. Since it is unclear which digits should be omitted, the to_char( ) function will enter the # character in place of each specified digit. It is therefore important to specify the maximum number of digits that you expect to receive back from the translation. You should also use a function such as translate( ) or one of the trim functions to remove unwanted whitespace from the translation.

Literal versions of meta-characters may be used within the format string by surrounding them with double quotes. Doing this within the format string changes the quoted meta-characters so they are interpreted literally. Note that in order to use a literal double-quote within this scheme, two backslashes must prefix the double-quote, as it is essentially twice escaped.

to_char(t, f)

When used with argument t of type timestamp and argument f of type text the to_char function formats the date and time represented by t to a character string returned as type text.

As with the numeric functionality of to_char( ), the character string f describes the meta-characters which are translated by PostgreSQL into the literal values they represent. Valid meta-characters that may be used within this format string for date and time values are outlined in Table 5-15.

Table 5-15. Timestamp conversion formatting characters

Character

Description

HH, HH12

The hour of day, from 1 to 12

HH24

The hour of the day, from 0 to 23

MI

The minute, from 0 to 59

SS

The second, from 0 to 59

SSSS

The seconds past midnight, from 0 to 86,399

AM, PM, A.M., P.M.

The meridian indicator in uppercase, with optional periods

am, pm, a.m., p.m.

The meridian indicator in lowercase, with optional periods

TZ, tz

The time zone, in upper or lowercase

CC

The two-digit century (not the year divided by 100)

Y, YY, YYY, YYYY, Y,YYY

The year’s last digit, last two digits, last three digits, or last four digits (with optional comma)

BC, AD, B.C., A.D.

Year qualifier, in uppercase

bc, ad, b.c., a.d.

Year qualifier, in lowercase

MONTH, Month, month

The full month name, padded on the right with blanks to 9 characters in length, in uppercase, init-capped, or lowercase

MON, Mon, mon

The abbreviated 3-letter month, in uppercase, init-capped, or lowercase

MM

The month number, from 1 to 12

RN, rn

The month in Roman Numerals, from I to XII, in upper or lowercase

DAY, Day, day

The full day name, padded on the right to 9 characters in length, in uppercase, init-capped, or lowercase

DY, Dy, dy

The abbreviated 3-letter day, in uppercase, init-capped, or lowercase

DDD, DD, D

The day of the year, from 1 to 366, day of the month, from 1 to 31, or day of the week, from 1 to 7 (beginning on Sunday)

W

The week of the month, from 1 to 5 (from the 1st day of the month)

WW

The week of the year, from 1 to 53 (from the 1st day of the year)

IW

The ISO week of the year (from the 1st Thursday of the new year)

TH, th

The appropriate ordinal suffix for the preceding numeric value, upper or lowercase

fm

Causes extra padding to be omitted, including whitespace, and extra zeroes

The TH suffix and FM prefix must be directly adjacent to the value they are modifying. For example, to apply FM to the Day value, the complete sequence would be FMDay (not FM Day). Similarly, to attach the ordinal suffix to the DD day of the month, the complete sequence would be DDTH (not DD TH).

to_date(s, f)

The to_date( ) function accepts two arguments s and f, each of type text. The argument f describes, using the date-specific meta-characters detailed in Table 5-15, the format of the date described by the string s. The result is returned as type date.

While PostgreSQL can figure out a wide variety of date formats, it cannot support every arbitrary date format. The to_date( ) function insures that, provided the format can be described using the meta-characters from Table 5-14, nearly any date format can be converted to a valid date value.

An aggregate function is a special kind of function that operates on several rows of a query at once, returning a single result. Such functions are generally only used in queries which make use of the GROUP BY clause to associate rows together by like criteria, though they may be used in queries which only contain aggregate functions in their target list. When performing the latter, the aggregate function operates on all selected rows from the result set.

Table 5-16 provides an overview of PostgreSQL’s supported aggregate functions. To see a complete list of aggregate functions, you may use the \da command within psql.

The following sections describe each aggregate function in further detail, including specific information on usage, examples, and valid input data types. In each of the functional explanations, the term expression refers to any valid identifier in a result set, or any valid expression operating on such an identifier.

When calling an aggregate function, aggregate expressions are employed to describe an expression from the result set created by the SELECT statement. An aggregate expression is similar to an ordinary SQL expression, but may be preceded by either the ALL or the DISTINCT keyword.

The use of the DISTINCT keyword in an aggregate expression causes only grouped rows with unique values (as described by the expression) to be evaluated by the function. Any duplicate rows will be suppressed. Similar to the use of the ALL keyword in a SELECT statement, the use of ALL in an aggregate expression has no function other than to make more explicit the request for all grouped rows to be evaluated to the function. Example 5-82 demonstrates each of the aggregate expression forms.

There is one final form of aggregate expression, as demonstrated by the all_rows result column in Example 5-82. When the asterisk (*) symbol is supplied as the aggregate expression, it instructs the aggregate function to evaluate all rows, including rows with values of NULL, which are ordinarily ignored. Since the subjects table contains one row with a NULL value in the location column, the counted rows for location differ from those counted for *.