Data Types

SQL is considered a strongly typed language. This means that any piece of data represented by PostgreSQL has an associated data type, even if it is not plainly obvious. A data value’s type both defines and constrains the kinds of operations which may be performed on it.

Not only is every piece of data associated with a type, but types play a large part in the construction of tables. As stated in the section titled Introduction to Relational Databases, tables are made up of one or more columns. These columns must, in addition to having a name, have a specific data type.

Note

Data Types

While PostgreSQL provides a wide variety of built-in data types, you also have the option to add new data types to PostgreSQL using the CREATE TYPE command. See the reference entry on CREATE TYPE for more on this command.

Table 3-10 lists the data types officially supported by PostgreSQL, as well as any PostgreSQL recognized aliases (alternative names that are identical in connotation). There are many other internal (meaning they are no longer intended for normal use) or deprecated (outdated, and discouraged) data types available that are unlisted.

Additionally, while most of the data types implemented in PostgreSQL are directly derived from SQL standards, there are some actively maintained data types that are non-standard (such as the geometric and spacial types). Therefore, you will not always be able to find equivalent types on other SQL-capable database management systems.

Table 3-10. PostgreSQL supported data types

Category

Data type

Description

Standardization

Boolean & binary types

boolean, bool

A single true or false value.

SQL99

 

bit(n)

An n -length bit string (exactly n binary bits).

SQL92

 

bit varying(n), varbit(n)

A variable n -length bit string (up to n binary bits)

SQL92

Character types

character (n), char(n)

A fixed n-length character string.

SQL89

 

character varying(n), varchar(n)

A variable length character string of up to n characters.

SQL92

 

text

A variable length character string, of unlimited length.

PostgreSQL-specific

Numeric types

smallint, int2

A signed 2-byte integer.

SQL89

 

integer, int, int4

A signed 4-byte integer.

SQL92

 

bigint, int8

A signed 8-byte integer, up to 18 digits in length.

PostgreSQL-specific

 

real, float4

A 4-byte floating-point number.

SQL89

 

double precision, float8, float

An 8-byte floating-point number.

SQL89

 

numeric(p, s), decimal(p, s)

An exact numeric type with arbitrary precision p, and scale s.

SQL99

 

money

A fixed precision, U.S.-style currency.

PostgreSQL-specific, deprecated.

 

serial

An auto-incrementing 4-byte integer.

PostgreSQL-specific

Date and time types

date

The calendar date (day, month and year).

SQL92

 

time

The time of day.

SQL92

 

time with time zone

The time of day, including time zone information.

SQL92

 

timestamp (includes time zone)

Both the date and time.

SQL92

 

interval

An arbitrarily specified length of time.

SQL92

Geometric types

box

A rectangular box in a 2D plane.

PostgreSQL-specific

 

line

An infinite line in a 2D plane.

PostgreSQL-specific

 

lseg

A finite line segment in a 2D plane.

PostgreSQL-specific

 

circle

A circle with center and radius.

PostgreSQL-specific

 

path

Open and closed geometric paths in a two-dimensional plane.

PostgreSQL-specific

 

point

geometric point in a 2D plane

PostgreSQL-specific

 

polygon

A closed geometric path in a 2D plane.

PostgreSQL-specific

Network types

cidr

An IP network specification.

PostgreSQL-specific

 

inet

A network IP address, with optional subnet bits.

PostgreSQL-specific

 

macaddr

A MAC address (e.g., an Ethernet card’s hardware address).

PostgreSQL-specific

System types

oid

An object (row) identifier.

PostgreSQL-specific

 

xid

A transaction identifier.

PostgreSQL-specific

Remaining true to theme, the following sections on data types will describe in further detail each of the most widely used and practical types. This book will not go into detail on the non-standard and/or more esoteric types, such as the geometric, network and bitwise types. These sections include information on valid usage, storage considerations, input and output formats and general syntactic conventions. Before we go much further on specific data types there are a couple of topics worth discussing, including the NULL keyword.

Despite the previously discussed rule that a column can have only one data type and logically accept only that type, there is a value that all columns can be defined as, no matter what their data type. This is the value a column is set to when you use the SQL keyword NULL. Essentially, NULL has no data value, so it is not considered a type; it is a system value that indicates to the database that the field it is located within contains no value. The only exception to the rule that any column can contain a NULL is when the NOT NULL constraint is specified for a column.

NULL is often used in places where a value is optional. It can be a convenient way of omitting data without having to resort to strange or arbitrary conventions, such as storing negative values in an integer field to represent omitted data. While your system requirements may change over time, the connotation of NULL is always NULL.

NULL can be thought of as a meta-value: a value that represents a lack of a value, which will never be equivalent to a non-NULL value. One problem often encountered when working with NULL values is that they are easily confused with empty character strings, which return a blank value to the client when selected. The reason this can be confusing is that NULL values also return a blank value when selected; however, they are completely different than empty character strings and this must be understood in order to avoid creating faulty queries or code. A character string column that contains a blank value still contains a string of characters, though the characters that compose the string are blank; thus, there is still a value in the column. A NULL value represents the complete absence of value within the column, not that it is merely blank.

This is an important distinction, as the rules for SQL operations involving the NULL value are quite different than the rules for operations involving empty string values. This internal distinction is especially important in reference to joins, which are discussed in Chapter 4.

The return of both NULL and empty values is shown in Example 3-15, which retrieves a set of five books from the books table. The first SELECT query shows that there appear to be two books which have been inserted without titles. Upon successive querying, however, it becomes clear that while neither have visible titles, one of the books has an empty value for its title (id 100), while the other has a NULL value.

Example 3-16 demonstrates a more practical (and likely) use of NULL in a table called editions, which relates a book’s ISBN number to its publication date.

NULL might be used in this manner in order to represent books with editions that are not yet published, or for books whose publication date was unknown when entered into the database. It could be misleading to supply some arbitrarily illogical date for a book fitting either of these criteria, and in both cases, NULL makes sense as a solution.

A Boolean value is a simple data structure which can only represent values of true or false. PostgreSQL supports the SQL99-defined boolean data type, with a PostgreSQL-specific alias of bool.

Like all other data types, Boolean values can also be set to NULL. If a Boolean is set to NULL, it will never be interpreted as either true or false; it will be interpreted as NULL. This may seem obvious, but it is significant in situations where you may think to check for NULL Booleans by checking for false values (which won’t work). You must use IS NULL to check for NULL Booleans. The ability to be true, false, or NULL (and its related rules regarding the designation of NULL as not being true or false) is known as three-valued logic.

Table 3-11 shows the valid constant values for a true or false state that are recognized by PostgreSQL. Which convention you choose to employ is dependent solely on your own preference. All variations of true, as well as all variations of false, are interpreted identically by the server.

Note

Supported true or false constants

If you decide to use the constants listed in Table 3-11, every value (except for true and false) must be enclosed within single quotes. Failure to do so will result in a server error.

Example 3-17 creates a table named daily_inventory that logs what books are stock and which are not, correlating an ISBN number with a Boolean value. Once created, the table is populated with data via a series of INSERT statements involving a string constant (the ISBN number), and a variety of valid Boolean constants.

Now that the table has been populated with records, a SELECT query may be issued to easily check which books are in stock, as shown in Example 3-18.

With a Boolean column you have the ability to imply a true value by referencing the column name without any kind of operator or modifying keyword. This can lead to more intuitive looking queries for well-designed tables, as shown in Example 3-19.

Although the second query does not specify ‘true’ or ‘false’, it implicitly looks for a value of ‘true’ by omitting a comparison operator.

Similarly, if you want to search for false values, you may either compare the named column’s value against any of the valid boolean constants in Table 3-11, or you may use the SQL keyword NOT just before the column name. Each method is demonstrated in Example 3-20.

In this way, you can see how SQL was designed with human readability in mind. By naming your tables and columns in well-designed terms, a SQL query can read almost as plainly as an English sentence.

For the more programming-oriented readers, it may be of interest that you can use the inequality (!=) operator to compare the value of a boolean field against any of the values in Table 3-11 (e.g., WHERE in_stock != 't'). As such, the following three syntactic variations are each equivalent:

SELECT * FROM daily_inventory WHERE NOT in_stock;
SELECT * FROM daily_inventory WHERE in_stock = 'no';
SELECT * FROM daily_inventory WHERE in_stock != 't';

You may have noticed that while seven rows were inserted into the table in Example 3-17, only six rows were returned between the books found in stock, and those found out of stock. This is due to the last insertion in Example 3-17 not supplying a value at all for the in_stock column, leaving the record for the book with ISBN 0451160916 with a NULL value in the in_stock column.

As stated previously, NULL will not register as either true or false. As such, you may use the SQL phrase IS NULL to check for rows with NULL values. Alternatively, you may use != but you will risk portability issues with other databases. The following syntax demonstrates a SQL query which uses the IS NULL phrase:

booktown=# SELECT * FROM daily_inventory WHERE in_stock IS NULL;
    isbn    | in_stock
------------+----------
 0451160916 |
(1 row)

Since IS NULL is a general SQL phrase, you can use the same WHERE clause in an UPDATE statement to correct any accidental NULL values.

Character types are required any time that you wish to reference character data, such as blocks of ASCII text. They are commonly used for storing names, addresses, and so on.

SQL provides two character types called character, and character varying. In addition to these, a general text type is supported by PostgreSQL, which does not require an explicitly declared upper limit on the size of the field. Columns of type text are automatically re-sized according to the data you put in them, and they may re-size without boundaries (discounting, of course, the 1GB limit for a single field). Table 3-12 shows the available character data types within PostgreSQL.

The n in Table 3-12 represents an arbitrarily specified number of characters. This number is specified for a column when a table is created.

PostgreSQL’s numeric types are used to represent both integers and decimal floating-point values. From a general perspective, PostgreSQL’s supported numeric types consist of:

PostgreSQL has support for special types which fall under the family of numeric types, including the deprecated money type, and the special serial construct.

As shown in Table 3-13, several of PostgreSQL’s data types have aliases that are equivalent to their associated data types. This was done for ease of use, but at times it can be confusing, due to the fact that some of the aliases sound familiar. If you are not careful to understand what data type an alias you are using is associated with, you may accidentally reference the wrong data type. For example, in PostgreSQL the real and double precision data types represent values that in many other languages are referred to as float values; however, they both have aliases that contain the word “float” (float and float8 link to double precision; float4 links to real). Problems may result if if you attempt to use the float alias, thinking it is linked to real, when in fact it is associated with double precision.

The numeric (also known as decimal) type is a specially designed numeric data type that can represent arbitrarily large and precise values within a fixed length that is given by the user. When you create a table with a column of type numeric, you may specify in parentheses two values: the precision and the scale.

The precision is the maximum number of digits that the numeric value may hold (including digits to the right of the decimal point), while the scale describes how many of those digits of precision are to be to the right of the decimal point. If left unspecified, the precision will default to 30 digits, and scale to 6 digits. The maximum precision (and, hence, the maximum scale) you can set this to is 1,000. Setting the precision to 1,000 would allow a maximum 1,000 digits, which should be fairly adequate for most needs.

Unlike the floating-point data types, you will receive an overflow error if you attempt to insert a number that is larger than the allotted precision range. Beside this limitation, you should be able to insert any number that fits within the provided precision and scale of the numeric type column.

For example, in a numeric(11,6) column, you may safely insert the value 9.999999 with two digits too many to the right of the decimal point (though the value is rounded up to 10.000000). However, an attempt to insert the value 99999.99999999 will fail, as shown in Example 3-22.

Problems that arise from trying to insert values that are two large can be avoided by using the trunc() numeric truncating function within an INSERT command to make sure a number is truncated to a size suitable for the column it is being inserted into. You must provide the length it should be truncated to, which means you’ll have to be aware of the precisions you’ve previously specified. The use of trunc() is also illustrated within Example 3-22.

The serial type is a non-standard but useful shortcut which allows you to easily create an identifier column within a table that contains a unique value for each row. The serial type literally combines the functionality of a 4-byte integer data type, an index, and a sequence. Example 3-24 shows the serial type being used to generate a unique identifier for each row in a table named auto_identified.

Example 3-25 shows the same thing being accomplished using an integer column, the nextval() function, and a sequence. As of the writing of this book, these two methods are functionally identical.

See Chapter 7, for more information on using sequences.

Date and time types are a convenient way to store date and time related data in a uniform SQL data structure, without having to worry about the conventions involved with storage (e.g., if you were to try to store such information in a character data type). PostgreSQL uses Julian dates for all date and time calculations. Julian date representation is the commonly used January through December calendar that you are most likely familiar with. By fixing the length of a year at about 365.24 days, Julian dates can correctly calculate any date after 4713 BC, as well as far into the future.

PostgreSQL supports all of the SQL92-defined date and time types shown in Table 3-14, as well as some PostgreSQL-specific extensions to help with SQL92’s time-zone limitations.

Date input can be accepted by PostgreSQL in many common formats, including the ISO-8601 format, the traditional SQL format, the original PostgreSQL format, and more. Table 3-15 lists several of these date formats.

These formats are relevant to the date and the timestamp data types.

When specifying a named month in a date value to PostgreSQL, you may either type the complete month name, or choose from a set of defined abbreviations for each month. These abbreviations are listed in Table 3-16.

Similarly, Table 3-17 lists recognized abbreviations for weekday names.

Despite the wide variety of ways in which PostgreSQL can interpret date values, the values are always stored uniformally, and will be returned in a consistent format. As such, you have a variety of methods available to you to customize the default behavior with which date and time values are returned to you.

To set the general date/time output format, the SET command can be applied to the run-time variable DATESTYLE. This variable may be set to one of four available general styles shown in Table 3-18.

As an example, you can use the following SQL statement to set the date style to SQL:

booktown=# SET DATESTYLE TO SQL;
SET VARIABLE

If you perform a SELECT current_timestamp query after setting this variable, PostgreSQL should return the current time using the ISO format as instructed:

booktown=# SELECT current_timestamp;
         timestamp
----------------------------
 08/10/2001 13:25:55.00 PDT
(1 row)

The SHOW command can be used to display the current value of the DATESTYLE variable while PostgreSQL is running.

booktown=# SHOW DATESTYLE;
NOTICE:  DateStyle is SQL with US (NonEuropean) conventions
SHOW VARIABLE

In addition to these general formats, PostgreSQL’s date output format has two other variants which further describe how to display the date, shown in Table 3-19: European and non-European (U.S.). These determine whether the format is day followed by month, or vice versa. This variation can be applied on top of the previous four general formats with the same syntax to SET DATESTYLE and will not modify your chosen format except for the arrangement of the month and day.

Furthermore, you may set both the general format and day/month convention by supplying both variables to the SET command, comma delimited. The order of these variables is not important to the SET command as long as the variables are not mutually exclusive (e.g., SQL and ISO), as shown in Example 3-26.

If you do not specify a month/day format, a reasonable default will usually be chosen (e.g., European is the default for the German regional format).

While SET DATESTYLE is a convenient way to set the output format, it is important to note that this is a run-time variable, which means that it exists only for the lifespan of your connected session. There are two methods available that allow you to provide a default value for the DATESTYLE variable, which lets you avoid explicitly setting the variable for each new session you begin:

Time values, like date values, may be entered in to a table in a number of ways. Commonly used formats are listed in Table 3-20. These apply to values of type time and time with time zone.

In addition to these formats, PostgreSQL allows for further description of a time value which is defined as time with time zone by supporting extra time zone parameters following the time value. Supported formats are shown in Table 3-21.

The time with time zone data type is mainly supported by PostgreSQL to adhere to existing SQL standards and for portability with other database management systems. If you need to work with time zones, it is recommended that you use the timestamp data type discussed in the section titled Timestamps. This is primarily because of the fact that, due to daylight savings, time zones cannot always be meaningfully interpreted without an associated date.

Internally, PostgreSQL keeps track of all time zone information as a numeric offset of GMT (Greenwich Mean Time), which is also known as UTC (Universal Coordinated Time). By default, PostgreSQL’s time display will use the time zone that your server’s operating system is configured for. If you wish the time value to operate under a different time zone, there are four ways in which you can modify the output:

Set the TZ environment variable on the server

This variable is found by the backend server as the default time zone when the postmaster starts up. It can be set, for example, in the postgres user’s .bash_profile file with a bash export TZ='zone' command.

Set the PGTZ environment variable on the client

If the PGTZ environment variable is set, it can be read by any client written with libpq and interpreted as the client’s default time zone.

Use the SET TIMEZONE TO SQL statement

This SQL command sets the time zone for the session to zone (e.g., SET TIME-ZONE TO UTC)

Use the AT TIME ZONE SQL clause

This SQL92 clause can be used to specify zone as a text time zone (e.g., PST) or as an interval (e.g., interval('—07:00')). This clause may be applied in the middle of a SQL statement following a value which contains a timestamp (e.g., SELECT my_timestamp AT TIME ZONE 'PST').

The SQL92 standard specifies a data typed called an interval, which represents a fixed span of time. By itself, an interval represents only a quantity of time, and does not begin or end at any set date or time. These intervals can be useful when applied to date and time values to calculate a new date or time, either by subtracting or adding the quantity. They can also be handy for quickly determining the precise interval between two date or time values. This can be achieved by subtracting date values, time values or timestamps from one another.

The two syntax variations below can specify an interval within PostgreSQL:

qty unit  [ ago ]
qty1 unit [, qty2 unit2 ... ] [ ago ]

Where:

Example 3-27 shows functional syntax for date and interval values being meaningfully combined. You can see that subtracting an inverted time interval (e.g., one with the term ago) is functionally identical to adding a normal interval. This can be thought of as similar to the effect of adding negative numbers to integer values.

PostgreSQL supports many special constants for use when referencing dates and times. These constants represent common date/time values, such as now, tomorrow, and yesterday. The predefined date and time constants supported by PostgreSQL are listed in Table 3-23.

PostgreSQL also provides three built-in functions for retrieving the current time, date, and timestamp. These are aptly named current_date, current_time, and current_timestamp.

The now and current timestamp constants may seem to be identical, looking solely at their names. They are, however, very different in terms of storing them in a table. The now constant is translated into the timestamp of the system time at the execution of whichever command referenced it (e.g., the time of insertion, if now had been referenced in an INSERT statement). In contrast, the current constant, as it is a deferred identifier, will actually appear as the phrase current in the database. From there, it can be translated (e.g., via the to_char() function) to the timestamp associated with the transaction time of any query which requests that value.

In other words, current will always tell you the “current” time when queried, regardless of when it was stored to the table. The current constant can be used in special situations, such as process tracking, where you may need to calculate the difference between a timestamp made with now and the current date and time to find the total time the process has been running. Example 3-28 demonstrates using the now and current constants to create a log of tasks. First, a table is created to house the task’s name, its start date and time, and its finished date and time. Two tasks are then added to the table, using the now constant to set the start date and current to set the completed date. The reason this is done is to show that both of these tasks are uncompleted. If a task were to be completed, the table could be updated to show a now timestamp for that task’s timefinished column.

Therefore, you generally want to use now when storing a transaction timestamp in a table, or even the current_timestamp function, which is equivalent to the output of now. Example 3-29 shows how this could be a potentially disastrous SQL design issue if not properly understood. It shows a pair of INSERT statements; one which uses now, another which uses current. If you watch the first row returned from the two queries (the row with a current timestamp), you’ll notice it changes in each query to show the updated system time, while the second row remains the same (this is the row in which now was used).

Geometric types in PostgreSQL represent two dimensional spatial objects. These types are not standard SQL data types, and will not be discussed in depth in this book. Table 3-24 gives a brief overview of each of the available geometric types.

The original relational model specifies that the values represented by columns within a table be an atomic piece of data, object-relational database systems such as PostgreSQL allow non-atomic values to be used through data structures called arrays.

An array is a collection of data values referenced through a single identifier. The array may be a collection of values of a built-in data type or a user-defined data type, but every value in the array must be of the same type. Arrays can be accessed from a table through subscript notation via square brackets (e.g., my_array[0]). You can also use an array constant via curly braces within single quotes (e.g., '{value_one,value_two,value_three}').

In order to actually insert array values into a table column, you need a way to refer to several values as an array in a SQL statement. The formal syntax of an array constant is a grouping of values, separated by delimiters (commas, for built-in data types), enclosed by curly braces ({}), which are in turn enclosed by single quotes, as follows:

'{ value1, value2 [, ...] }'

The values in this syntax can be any valid PostgreSQL data type. As the entire array is constrained by single quotes, the use of single quotes within an array value must be escaped, just as they must be within a string constant. The use of commas to delimit the values, however, poses an interesting problem pertaining to the use of character strings which contain commas themselves, as the commas will be interpreted as delimiters if not within single-quotes. However, as just mentioned, the singles quotes constrain the array, not the array’s values.

PostgreSQL’s method of handling this is to use double-quotes to quote string constants where single-quotes would ordinarily be used outside of an array context, as follows:

'{"value1", "value 2, which contains a comma" }'

It’s vital to remember that arrays require the single quotes surrounding the curly braces in order to be interpreted correctly by PostgreSQL. You can think of array constants as being akin to a special type of string constant, which is interpreted as an array based on where it is used (e.g., when used to add records to a target column which is of an array data type). This is because unless used in an array context, a constant of the this format will be interpreted by PostgreSQL as a normal string constant (as it is bound by single quotes) which just happens to include curly braces.

PostgreSQL supports three separate conventions for type coercion (also called type casting, or explicit type casting). Type coercion is a somewhat ugly looking term which refers to a PostgreSQL method for changing a value from one data type to another. In the middle of a SQL statement, this has the net effect of explicitly creating a constant of an arbitrary type.

Generally any of the following three methods can be used in order to cast the value contained within a string constant to another type:

In the case of maintained numeric constants that you wish to cast to a character string, you will need to use one of the following syntax forms:

The value in this syntax represents the constant whose data type you wish to modify, and type represents the type that you wish to coerce, or cast, the value to.

Constants are not the only data values that may be coerced to different types. Columns of a data set returned by a SQL query may be cast by using its identifier in one of the following syntax forms:

Bear in mind that not every data type can be coerced into every other data type. For example, there is no meaningful way to convert the character string abcd into a binary bit type. Invalid casting will result in an error from PostgreSQL. Common valid casts are from character string, date/time type, or a numeric type to text, or character strings to numeric values.

In addition to these type casting conventions, there are some functions that can be called to achieve essentially the same effect as an explicit cast of any of the previously mentioned forms. These often bear the name of the type itself (such as the text() function), though others are named more specifically (such as bitfromint4()). Example 3-30 shows such a function, converting the integer 1000 to a character string of type text representing the characters 1000.

Because of conflicting semantics recognized by PostgreSQL’s parser, the type coercion format of type 'value' can only be used to specify the data type of a single value (e.g., a string constant bound by single quotes). In contrast, the other available methods of type coercion ('value'::type, CAST('value'AS type) and type conversion functions, where applicable) can be used to to specify the type of arbitrary expressions.

This is partially because attempting to follow a data type with a grouped expression (e.g., in parentheses) will cause PostgreSQL to expect a function with the name of the provided data type (which will often cause an error) while each of the other methods are syntactically valid upon grouped expressions.

booktown=# SELECT 1 + integer ('1' || '2') AS add_one_to_twelve;
ERROR:  Function 'integer(text)' does not exist
        Unable to identify a function that satisfies the given argument types
        You may need to add explicit typecasts
booktown=# SELECT 1 + ('1' || '2')::integer AS add_one_to_twelve;
 add_one_to_twelve
-------------------
              13

(1 row)


booktown=#  SELECT 1 + CAST('1' || '2' AS integer) AS add_on_to_twelve;
 add_on_to_twelve
------------------
             13

(1 row)