The SELECT
command is used to query the database
and return a result. The SELECT
command is the only SQL command capable
of returning a user-generated result, be it a table query or a
simple expression. Most consider SELECT
to be the most complex SQL command.
Although the basic format is fairly easy to understand, it does
take some experience to understand its full power.
All of Chapter 5 is devoted to the SELECT
command.
The core SELECT
command follows a simple pattern that can be roughly
described as SELECT
. The
output section describes the data that makes up the result
set, the input section describes what tables, views,
subqueries, etc., will act as data sources, and the filter
section sets up conditions on which rows are part of the
result set and which rows are filtered out.output
FROM
input
WHERE
filter
A SELECT
can either be SELECT ALL
(default) or
SELECT DISTINCT
.
The ALL
keyword returns
all rows in the result set, regardless of their
composition. The DISTINCT
keyword will force the select
statement to eliminate duplicate results in the result
set. There is usually a considerable performance penalty
for calculating larger DISTINCT
results.
The result set columns are
defined by a series of comma-separated expressions. Every
SELECT
statement
must have at least one result expression. These
expressions often consist of only a source column name,
although they can be any general expression. The character
*
means “return
all columns,” and will include all standard table columns
from all of the source tables. All the standard columns of
a specific source table can be returned with the format
.
In both cases, the table_name
.*ROWID
column will not be included,
although ROWID
alias
columns will be included. Virtual tables can also mark
some columns as hidden. Like the ROWID
column, hidden columns will not be
returned by default, but can be explicitly named as a
result column.
Result columns can be given
explicit names with an optional AS
clause (the actual AS
keyword is optional as
well). Unless an AS
clause is given, the name of the output column is at the
discretion of the database engine. If an application
depends on matching the names of specific output columns,
the columns should be given explicit names with an
AS
clause.
The FROM
clause defines where the data comes
from and how it is shuffled together. If no FROM
clause is given, the
SELECT
statement
will return only one row. Each source is joined together
with a comma or a JOIN
operation. The comma acts as an unconditional CROSS JOIN
. Different
sources, including tables, subqueries, or other JOIN
statements, can be
grouped together into a large transitory table, which is
fed through the rest of the SELECT
statement, and ultimately used to
produce the result set. For more information on the
specific JOIN
operators, see FROM Clause.
Each data source, be it a
named table or a subquery, can be given an optional
AS
clause.
Similar to result set columns, the actual AS
keyword is optional.
The AS
clause allows an
alias to be assigned to a given source. This is important
to disambiguate table instances (for example, in a
self-join).
The WHERE
clause is used to filter rows. Conceptually, the
FROM
clause,
complete with joins, is used to define a large table that
consists of every possible row combination. The WHERE
clause is evaluated
against each of those rows, passing only those rows that
evaluate to true. The WHERE
clause can also be used to define
join conditions, by effectively having the FROM
clause produce the
Cartesian product of the two tables, and use the WHERE
clause to filter out
only those rows that meet the join condition.
Beyond SELECT
, FROM
, and WHERE
, the SELECT
statement can do
additional processing with GROUP
BY
, HAVING
, ORDER
BY
, and LIMIT
.
The GROUP BY
clause allows sets of rows in the result set to be
collapsed into single rows. Groups of rows that share
equivalent values in all of the expressions listed in the
GROUP BY
clause
will be condensed to a single row. Normally, every source
column reference in the result set expressions should be a
column or expression included in the GROUP BY
clause, or the
column should appear as a parameter of an aggregate
function. The value of any other source column is the
value of the last row in the group to be processed,
effectively making the result undefined. If a GROUP BY
expression is a
literal integer, it is assumed to be a column index to the
result set. For example, the clause GROUP BY 2
would group the
result set using the values in the second result
column.
A HAVING
clause can only be used in
conjunction with a GROUP
BY
clause. Like the WHERE
clause, a HAVING
expression is used
as a row filter. The key difference is that the HAVING
expression is
applied after any GROUP
BY
manipulation. This sequence allows the
HAVING
expression
to filter aggregate outputs. Be aware that the WHERE
clause is usually
more efficient, since it can eliminate rows earlier in the
SELECT
pipeline.
If possible, filtering should be done in the WHERE
clause, saving the
HAVING
clause to
filter aggregate results.
The ORDER BY
clause sorts the result set into
a specific order. Typically, the output ordering is not
defined. Rows are returned as they become available, and
no attempt is made to return the results in any specific
order. The ORDER BY
clause can be used to enforce a specific output ordering.
Output is sorted by each expression in the clause, in
turn, from most specific to least specific. The fact that
the output of a SELECT
can be ordered is one of the key differences between an
SQL table and a result set. As with GROUP BY
, if one of the
ORDER BY
expressions is a literal integer, it is assumed to be a
column index to the result set.
Finally, the LIMIT
clause can be used
to control how many rows are returned, starting at a
specific offset. If no offset is provided, the LIMIT
will start from the
beginning of the result set. Note that the two syntax
variations (comma or OFFSET
) provide the parameters in the
opposite order.
Since the row order of a
result is undefined, a LIMIT
is most frequently used in
conjunction with an ORDER
BY
clause. Although it is not strictly
required, including an ORDER
BY
brings some meaning to the limit and
offset values. There are very few cases when it makes
sense to use a LIMIT
without some type of imposed ordering.
Compound statements
allow one or more SELECT...FROM...WHERE...GROUP BY...HAVING
substatements to be brought together using set operations.
SQLite supports the UNION
, UNION
ALL
, INTERSECT
, and EXCEPT
compound operators. Each SELECT
statement in a
compound SELECT
must
return the same number of columns. The names of the result
set columns will be taken from the first SELECT
statement.
The UNION
operator returns the union of the
SELECT
statements. By default, the UNION
operator is a proper set operator
and will only return distinct rows (including those from a
single table) . UNION
ALL
, by contrast, will return the full set
of rows returned by each SELECT
. The
UNION ALL
operator is significantly less
expensive than the UNION
operator, so the use of UNION ALL
is encouraged,
when possible.
The INTERSECT
command will return the set of
rows that appear in both SELECT
statements. Like UNION
, the INTERSECT
operator is a
proper set operation and will only return one instance of
each unique row, no matter how many times that row appears
in both result sets of the individual SELECT
statements.
The EXCEPT
compound operator acts as a
set-wise subtraction operator. All unique rows in the
first SELECT
that do
not appear in the second SELECT
will be returned.