Congratulations! In the prior chapters, you were introduced to Presto and learned how to get it installed, configured, and started. Now you get to use it.
The Presto command-line interface (CLI) provides a terminal-based, interactive shell for running queries and for interacting with the Presto server to inspect metadata about it.
Just like the Presto server itself, the Presto CLI release binaries are distributed on the Maven Central Repository. The CLI application is available as an executable JAR file, which allows you to use it like a normal Unix executable.
You can see the list of available versions at https://repo.maven.apache.org/maven2/io/prestosql/presto-cli.
Locate the version of the CLI that is identical to the Presto server version you
are running. Download the *-executable.jar file from the versioned
directory, and rename it to presto
; for example, with wget
and version 330:
$
wget -O presto\
https://repo.maven.apache.org/maven2/\
io/prestosql/presto-cli/330/presto-cli-330-executable.jar
Ensure that the file is set to be executable. For convenience, make it available
on the PATH
; for example, by copying it to ~/bin and adding that folder to
the PATH
:
$
chmod +x presto$
mv presto&
#x7e;/bin
$
export
PATH
=
~/bin/:$PATH
You can now run the Presto CLI and confirm the version:
$
presto --version
Presto CLI 330
Documentation for all available options and commands is available with the
help
option:
$
presto --help
Before you start using the CLI, you need to determine which Presto server you
want to interact with. By default, the CLI connects to the Presto server running
on http://localhost:8080. If your server is running locally for testing or
development, or you access the server with SSH, or you’re using the Docker
container with exec
and the CLI is installed there, you are ready to go:
$
presto
presto>
If Presto is running at a different server, you have to specify the URL:
$
presto --server https://presto.example.com:8080
presto>
The presto>
prompt shows that you are using the interactive console accessing
the Presto server. Type help
to get a list of available commands:
presto> help Supported commands: QUIT EXPLAIN [ ( option [, ...] ) ] <query> options: FORMAT { TEXT | GRAPHVIZ | JSON } TYPE { LOGICAL | DISTRIBUTED | VALIDATE | IO } DESCRIBE <table> SHOW COLUMNS FROM <table> SHOW FUNCTIONS SHOW CATALOGS [LIKE <pattern>] SHOW SCHEMAS [FROM <catalog>] [LIKE <pattern>] SHOW TABLES [FROM <schema>] [LIKE <pattern>] USE [<catalog>.]<schema>
Most commands, and especially all SQL statements, in the CLI need to end with a semicolon. You can find much more information about SQL on Presto in “SQL with Presto”. For now, you can just explore a few simple things to get started.
First, you can check what data sources are configured as catalogs. At a minimum,
you find the internal metadata catalog—system
. In our case, you also
find the tpch
catalog:
presto> SHOW CATALOGS;
Catalog --------- system tpch(
2
rows)
Query 20191212_185850_00001_etmtk, FINISHED,1
node Splits:19
total,19
done
(
100.00%)
0:01[
0
rows, 0B]
[
0
rows/s, 0B/s]
You can easily display available schemas as well as tables in the schemas. Each time you query Presto, query processing statistics are returned, together with the result. You see them just as in the preceding code. We are going to omit them in the following listings:
presto> SHOW SCHEMAS FROM tpch;
Schema -------------------- information_schema sf1 sf100 sf1000 sf10000 sf100000 sf300 sf3000 sf30000 tiny(
10
rows)
presto> SHOW TABLES FROM tpch.sf1;
Table ---------- customer lineitem nation orders part partsupp region supplier(
8
rows)
Now you are ready to query some actual data:
presto> SELECT count(
name)
FROM tpch.sf1.nation;
_col0 ------- 25(
1
row)
Alternatively, you can select a schema to work with, and then omit the qualifier from the query:
presto> USE tpch.sf1;
USE presto:sf1> SELECT count(
name)
FROM nation:
If you know that you are going to work with a specific schema, before you start the CLI, you can specify it at startup:
$
presto --catalog tpch --schema sf1
Now you are ready to exercise all your SQL knowledge and the power of Presto to query the configured data sources.
To exit out of the CLI, you can simply type quit
or exit
, or press Ctrl-D.
By default, the results of queries are paginated using the less
program, which
is configured with a carefully selected set of options. This behavior can be
overridden by setting the environment variable PRESTO_PAGER
to the name of a
different program such as more
, or set it to an empty value to completely
disable pagination.
The Presto CLI keeps a history of the previously used commands. You can use the up and down arrows to scroll through the history as well as Ctrl-S and Ctrl-R to search through the history. If you want to execute a query again, press Enter to execute the query.
By default, the Presto history file is located in ~/.presto_history. You can
change the default with the PRESTO_HISTORY_FILE
environment variable.
The Presto CLI provides the --debug
option to enable debug information when
running queries:
$
presto --debug presto:sf1> SELECT count(
*)
FROM foo;
Query 20181103_201856_00022_te3wy failed: line 1:22: Table tpch.sf1.foo does not exist io.prestosql.sql.analyzer.SemanticException: line 1:22: Table tpch.sf1.foo does not exist ... at java.lang.Thread.run(
Thread.java:748)
It is possible to execute a query directly with the presto
command and have
the Presto CLI exit after query completion. This is often desirable if you are
scripting execution of multiple queries or are automating a more complex
workflow with another system. The execution returns the query results from
Presto.
To run a query with the Presto CLI, use the --execute
option. It is also
important to fully qualify the table (for example, catalog.schema.table
):
$
presto --execute'SELECT nationkey, name, regionkey FROM tpch.sf1.nation LIMIT 5'
"0"
,"ALGERIA"
,"0"
"1"
,"ARGENTINA"
,"1"
"2"
,"BRAZIL"
,"1"
"3"
,"CANADA"
,"1"
"4"
,"EGYPT"
,"4"
Alternatively, use the --catalog
and --schema
options:
$
presto --catalog tpch --schema sf1\
--execute'select nationkey, name, regionkey from nation limit 5'
You can execute multiple queries by separating the queries with a semicolon.
The Presto CLI also supports executing commands and SQL queries in a file, like nations.sql:
USE
tpch
.
sf1
;
SELECT
name
FROM
nation
;
When you use the CLI with the -f
option, it returns the data on the command line and
then exits:
$
presto -f nations.sql USE"ALGERIA"
"ARGENTINA"
"BRAZIL"
"CANADA"
...
Presto can be accessed from any Java application using a Java Database Connectivity (JDBC) driver. JDBC is a standard API that provides the necessary methods such as querying, inserting, deleting, and updating data in a relational database. Many client and server-side applications running on the JVM implement features for database management, reporting, and other aspects and are using JDBC to access the underlying database. All of these applications can use Presto with the JDBC driver.
The Presto JDBC driver allows you to connect to Presto and interact with Presto via SQL statements.
If you’re familiar with the different implementations of JDBC drivers, the Presto JDBC driver is a Type 4 driver. This simply means it talks to the Presto native protocol.
Using the JDBC driver enables you to use powerful SQL client and database administration tools, such as the open source tools DBeaver or SQuirreL SQL Client and many others. Report generation, dashboard, and analytics tools using JDBC can also be used with Presto.
The steps to use any of these tools with Presto are similar:
Download the JDBC driver.
Make the JDBC driver available on the classpath of the application.
Configure the JDBC driver.
Configure the connection to Presto.
Connect to Presto and use it.
For example, the open source database management tool DBeaver makes this process simple. After installing and starting DBeaver, follow these simple steps:
From the File menu, select New.
From the DBeaver section, select Database Connection and then click Next.
Type prestosql
in the input field, select the icon, and click Next.
Configure the connection to Presto and click Finish. Note that a username value is required. You can provide a random name on a default installation of Presto without authentication.
Now you see the connection in the Database Navigator on the left and can inspect the schemas and tables, with an example displayed in Figure 3-1. You can also start the SQL Editor and start writing your queries and working with Presto.
SQuirreL SQL Client and many other tools use a similar process. Some steps, such as downloading the JDBC driver, and configuring the database driver and connection, are more manual. Let’s look at the details.
The Presto JDBC driver is distributed on the Maven Central Repository. The server is available as a JAR file.
You can see the list of available versions at https://repo.maven.apache.org/maven2/io/prestosql/presto-jdbc.
Determine the largest number, which represents the latest release, and navigate
into the folder and download the .jar file. You can also download the archive
on the command line; for example, with wget
for version 330:
$
wget https://repo.maven.apache.org/maven2/\
io/prestosql/presto-jdbc/330/presto-jdbc-330.jar
To use the Presto JDBC driver in your application, you add it to the classpath of the Java application. This differs for each application but often uses a folder named lib, as is the case for SQuirreL SQL Client. Some applications include a dialog to add libraries to the classpath, which can be used alternatively to copying the file into place manually.
Loading of the driver typically requires a restart of the application.
Now you are ready to register the driver. In SQuirreL SQL Client, you can do
that with the +
button to create a new driver in the Drivers tab on the left
of the user
interface.
When configuring the driver, you need to ensure that you configure the following parameters:
Class name: io.prestosql.jdbc.PrestoDriver
Example JDBC URL: jdbc:presto://host:port/catalog/schema
Name: Presto
Website: https://prestosql.io
Only the class name, JDBC URL, and the JAR on the classpath are truly required for the driver to operate. Other parameters are optional and depend on the application.
With the driver registered and Presto up and running, you can now connect to it from your application.
In SQuirreL SQL Client, this connection configuration is called an alias. You can
use the Alias tab on the left of the user interface and the +
button to
create a new alias with the following parameters:
A descriptive name for the Presto connection. The name is more important if you end up connecting to multiple Presto instances, or different schemas and databases.
Select the Presto driver you created earlier.
The JDBC URL uses the pattern jdbc:presto://host:port/catalog/schema, with catalog and schema values optional. You can connect to Presto, installed earlier on your machine and running on http://localhost:8080, with the JDBC URL jdbc:presto://localhost:8080. The host parameter is the host where the Presto coordinator is running. It is the same hostname you use when connecting via the Presto CLI. This can be in the form of an IP address or DNS hostname. The port parameter is the HTTP port to connect to Presto on the host. The optional catalog and schema parameters are used to establish a connection by using the catalog and schema specified. When you specify these, you do not have to fully qualify the table names in the queries.
A username value is required, even when no authentication is configured on Presto. This allows Presto to report the initiator for any queries.
The password is associated with the user and used for authentication. No password is required for a default installation of Presto without configured authentication.
The JDBC driver can receive further parameters as properties. The mechanism for providing these values depends on the application. Both DBeaver and SQuirreL SQL Client include a user interface to specify properties as part of the connection configuration:
SSL
SSLTrustStorePath
SSLTrustStorePassword
user
and password
applicationNamePrefix
Property used to identify the application to Presto. This is used to set the
source name for the Presto query. This name is displayed in the Presto Web UI
so that administrators can see where the query originated. Furthermore, it can
be used in conjunction with resource groups in which you can use the
ApplicationName
to decide how to assign resources in Presto. This is discussed
in “Resource Groups”.
The full list of available parameters for the JDBC drivers can be found in the Presto documentation; see “Documentation”.
Once you have configured the connection, you can use it to connect to Presto. This enables you to query Presto itself and all configured schemas and databases. The specific features available for query execution or report generation or any other functionality depend on the application connected to Presto. Figure 3-2 shows a successful connection to Presto in SQuirreL SQL Client with some example queries and a result set.
Similar to the connection to Presto with the JDBC driver—“Presto JDBC Driver”—Open Database Connectivity (ODBC) allows any application supporting ODBC to use Presto. It provides an API for typically C-based applications.
Currently, no open source ODBC driver for Presto is available. However, commercial drivers can be purchased from Starburst and Simba.
This enables several popular applications from the database administration, business intelligence, and reporting and analytics space, such as Microsoft Power BI, Tableau, SAS, Quest Toad, and others. ODBC also enables Microsoft Excel usage.
Besides the Presto CLI and the JDBC driver, maintained by the Presto team directly, numerous members of the larger Presto community have created client libraries for Presto.
You can find libraries for Python, C, Go, Node.js, R, Ruby, and others. A list is maintained on the Presto website discussed in “Website”.
These libraries can be used to integrate Presto with applications in these language ecosystems, including your own applications.
Every Presto server provides a web interface, commonly referred to as the Presto Web UI. The Presto Web UI, shown in Figure 3-3, exposes details about the Presto server and query processing on the server.
The Presto Web UI is accessible at the same address as the Presto server, using the same HTTP port number. By default, this port is 8080; for example, http://presto.example.com:8080. So on your local installation, you can check out the Web UI at http://localhost:8080.
The main dashboard shows details about the Presto utilization and a list of queries. Further details are available in the UI. All this information is of great value for operating Presto and managing the running queries.
Using the Web UI is very useful for monitoring Presto and tuning performance, as explained in more detail in “Monitoring with the Presto Web UI”. As a beginner user, it is mostly useful to confirm that the server is up and running and is processing your queries.
Presto is an ANSI SQL-compliant query engine. It allows you to query and manipulate data in any connected data source with the same SQL statements, functions, and operators.
Presto strives to be compliant with existing SQL standards. One of the main design principles of Presto is to neither invent another SQL-like query language nor deviate too much from the SQL standard. Every new functionality and language feature attempts to comply with the standard.
Extending the SQL feature set is considered only when the standard does not define an equivalent functionality. And even then, great care is taken to design the feature by considering similar features in the standard and other existing SQL implementations as a sign of what can become standard in the future.
In rare cases, when the standard does not define an equivalent functionality, Presto extends the standard. A prominent example are Lambda expressions; see “Lambda Expressions”.
Presto does not define any particular SQL standard version it complies with. Instead, the standard is treated as a living document, and the newest standard version is always considered important. On the other hand, Presto does not yet implement all the mandatory features defined in the SQL standard. As a rule, if an existing functionality is found as noncompliant, it is deprecated and later replaced with a standard compliant one.
Querying Presto can be done with the Presto CLI as well as any database management tool connected with JDBC or ODBC, as discussed earlier.
Presto enables SQL-based access to external data sources such as relational databases, key-value stores, and object storage. The following concepts are important to understand in Presto:
Adapts Presto to a data source. Every catalog is associated with a specific connector.
Defines details for accessing a data source; contains schemas and configures a specific connector to use.
A way to organize tables. A catalog and schema together define a set of tables that can be queried.
A set of unordered rows, which are organized into named columns with data types.
This section presents a short overview of supported SQL and Presto statements, with much more detail available in Chapter 8 and Chapter 9.
Presto metadata is contained in the system
catalog. Specific
statements can be used to query that data and, in general, find out more about the
available catalogs, schemas, information schemas, tables, functions, and more.
Use the following to list all catalogs:
SHOW
CATALOGS
;
Catalog
---------
system
tpch
(
2
rows
)
Show all schemas in the tpch
catalog as follows:
SHOW
SCHEMAS
FROM
tpch
;
Schema
------------------
information_schema
sf1
sf100
sf1000
sf10000
sf100000
sf300
sf3000
sf30000
tiny
(
10
rows
)
Here’s how to list the tables in the sf1
catalog:
SHOW
TABLES
FROM
tpch
.
sf1
;
Table
---------
customer
lineitem
nation
orders
part
partsupp
region
supplier
(
8
rows
)
Find out about the data in the region
table as follows:
DESCRIBE
tpch
.
sf1
.
region
;
Column
|
Type
|
Extra
|
Comment
-----------+--------------+-------+---------
regionkey
|
bigint
|
|
name
|
varchar
(
25
)
|
|
comment
|
varchar
(
152
)
|
|
(
3
rows
)
Other useful statements, such as USE
and SHOW FUNCTIONS
, are available. More
information about the system catalog and Presto statements is available in
“Presto Statements”.
With the knowledge of the available catalogs, schemas, and tables, you can use standard SQL to query the data.
You can check what regions are available:
SELECT
name
FROM
tpch
.
sf1
.
region
;
name
------------
AFRICA
AMERICA
ASIA
EUROPE
MIDDLE
EAST
(
5
rows
)
You can return a subset and order the list:
SELECT
name
FROM
tpch
.
sf1
.
region
WHERE
name
like
'A%'
ORDER
BY
name
DESC
;
name
---------
ASIA
AMERICA
AFRICA
(
3
rows
)
Joining multiple tables and other parts of the SQL standard are supported as well:
SELECT
nation
.
name
AS
nation
,
region
.
name
AS
region
FROM
tpch
.
sf1
.
region
,
tpch
.
sf1
.
nation
WHERE
region
.
regionkey
=
nation
.
regionkey
AND
region
.
name
LIKE
'AFRICA'
;
nation
|
region
------------+--------
MOZAMBIQUE
|
AFRICA
MOROCCO
|
AFRICA
KENYA
|
AFRICA
ETHIOPIA
|
AFRICA
ALGERIA
|
AFRICA
(
5
rows
)
Presto supports operators like ||
for string concatenation. You can also use
mathematical operators such as +
and -
.
You can change the preceding query to use JOIN
and concatenate the result string
to one field:
SELECT
nation
.
name
||
' / '
||
region
.
name
AS
Location
FROM
tpch
.
sf1
.
region
JOIN
tpch
.
sf1
.
nation
ON
region
.
regionkey
=
nation
.
regionkey
AND
region
.
name
LIKE
'AFRICA'
;
Location
-------------------
MOZAMBIQUE
/
AFRICA
MOROCCO
/
AFRICA
KENYA
/
AFRICA
ETHIOPIA
/
AFRICA
ALGERIA
/
AFRICA
(
5
rows
)
In addition to the operators, Presto supports a large variety of functions. They
range from simple use cases to very complex functionality. You can
display a list in Presto by using SHOW FUNCTIONS
.
A simple example is to calculate the average prices of all orders and display the rounded integer value:
SELECT round(avg(totalprice)) AS average_price FROM tpch.sf1.orders; average_price -------------- 151220.0 (1 row)
More details about SQL usage are available in the Presto documentation and in Chapter 8. Information about functions and operators is also available on the website, and you can find a good overview with more examples in Chapter 9.
Presto is up and running. You connected a data source and used SQL to query it. You can use the Presto CLI, or applications connected to Presto with JDBC.
With this powerful combination in place, you are ready to dive deeper. In the next chapters, we are going to do exactly that: learn how to install Presto for a larger production deployment, understand the architecture of Presto, and get into the details about SQL usage.