Chapter 3. Using Presto

Congratulations! In the prior chapters, you were introduced to Presto and learned how to get it installed, configured, and started. Now you get to use it.

Presto Command-Line Interface

The Presto command-line interface (CLI) provides a terminal-based, interactive shell for running queries and for interacting with the Presto server to inspect metadata about it.

Getting Started

Just like the Presto server itself, the Presto CLI release binaries are distributed on the Maven Central Repository. The CLI application is available as an executable JAR file, which allows you to use it like a normal Unix executable.

You can see the list of available versions at https://repo.maven.apache.org/maven2/io/prestosql/presto-cli.

Locate the version of the CLI that is identical to the Presto server version you are running. Download the *-executable.jar file from the versioned directory, and rename it to presto; for example, with wget and version 330:

$ wget -O presto \
https://repo.maven.apache.org/maven2/\
io/prestosql/presto-cli/330/presto-cli-330-executable.jar

Ensure that the file is set to be executable. For convenience, make it available on the PATH; for example, by copying it to ~/bin and adding that folder to the PATH:

$ chmod +x presto
$ mv presto &#x7e;/bin
$ export PATH=~/bin/:$PATH

You can now run the Presto CLI and confirm the version:

$ presto --version
Presto CLI 330

Documentation for all available options and commands is available with the help option:

$ presto --help

Before you start using the CLI, you need to determine which Presto server you want to interact with. By default, the CLI connects to the Presto server running on http://localhost:8080. If your server is running locally for testing or development, or you access the server with SSH, or you’re using the Docker container with exec and the CLI is installed there, you are ready to go:

$ presto
presto>

If Presto is running at a different server, you have to specify the URL:

$ presto --server https://presto.example.com:8080
presto>

The presto> prompt shows that you are using the interactive console accessing the Presto server. Type help to get a list of available commands:

presto> help
Supported commands:
QUIT
EXPLAIN [ ( option [, ...] ) ] <query>
    options: FORMAT { TEXT | GRAPHVIZ | JSON }
             TYPE { LOGICAL | DISTRIBUTED | VALIDATE | IO }
DESCRIBE <table>
SHOW COLUMNS FROM <table>
SHOW FUNCTIONS
SHOW CATALOGS [LIKE <pattern>]
SHOW SCHEMAS [FROM <catalog>] [LIKE <pattern>]
SHOW TABLES [FROM <schema>] [LIKE <pattern>]
USE [<catalog>.]<schema>

Most commands, and especially all SQL statements, in the CLI need to end with a semicolon. You can find much more information about SQL on Presto in “SQL with Presto”. For now, you can just explore a few simple things to get started.

First, you can check what data sources are configured as catalogs. At a minimum, you find the internal metadata catalog—system. In our case, you also find the tpch catalog:

presto> SHOW CATALOGS;
 Catalog
---------
 system
 tpch
(2 rows)

Query 20191212_185850_00001_etmtk, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

You can easily display available schemas as well as tables in the schemas. Each time you query Presto, query processing statistics are returned, together with the result. You see them just as in the preceding code. We are going to omit them in the following listings:

presto> SHOW SCHEMAS FROM tpch;
       Schema
--------------------
 information_schema
 sf1
 sf100
 sf1000
 sf10000
 sf100000
 sf300
 sf3000
 sf30000
 tiny
(10 rows)

presto> SHOW TABLES FROM tpch.sf1;
  Table
----------
 customer
 lineitem
 nation
 orders
 part
 partsupp
 region
 supplier
(8 rows)

Now you are ready to query some actual data:

presto> SELECT count(name) FROM tpch.sf1.nation;
 _col0
-------
    25
(1 row)

Alternatively, you can select a schema to work with, and then omit the qualifier from the query:

presto> USE tpch.sf1;
USE
presto:sf1> SELECT count(name) FROM nation:

If you know that you are going to work with a specific schema, before you start the CLI, you can specify it at startup:

$ presto --catalog tpch --schema sf1

Now you are ready to exercise all your SQL knowledge and the power of Presto to query the configured data sources.

To exit out of the CLI, you can simply type quit or exit, or press Ctrl-D.

Pagination

By default, the results of queries are paginated using the less program, which is configured with a carefully selected set of options. This behavior can be overridden by setting the environment variable PRESTO_PAGER to the name of a different program such as more, or set it to an empty value to completely disable pagination.

History

The Presto CLI keeps a history of the previously used commands. You can use the up and down arrows to scroll through the history as well as Ctrl-S and Ctrl-R to search through the history. If you want to execute a query again, press Enter to execute the query.

By default, the Presto history file is located in ~/.presto_history. You can change the default with the PRESTO_HISTORY_FILE environment variable.

Additional Diagnostics

The Presto CLI provides the --debug option to enable debug information when running queries:

$ presto --debug

presto:sf1> SELECT count(*) FROM foo;
Query 20181103_201856_00022_te3wy failed:
 line 1:22: Table tpch.sf1.foo does not exist
io.prestosql.sql.analyzer.SemanticException:
 line 1:22: Table tpch.sf1.foo does not exist
...
at java.lang.Thread.run(Thread.java:748)

Executing Queries

It is possible to execute a query directly with the presto command and have the Presto CLI exit after query completion. This is often desirable if you are scripting execution of multiple queries or are automating a more complex workflow with another system. The execution returns the query results from Presto.

To run a query with the Presto CLI, use the --execute option. It is also important to fully qualify the table (for example, catalog.schema.table):

$ presto --execute 'SELECT nationkey, name, regionkey FROM tpch.sf1.nation LIMIT 5'
"0","ALGERIA","0"
"1","ARGENTINA","1"
"2","BRAZIL","1"
"3","CANADA","1"
"4","EGYPT","4"

Alternatively, use the --catalog and --schema options:

$ presto --catalog tpch --schema sf1 \
  --execute 'select nationkey, name, regionkey from nation limit 5'

You can execute multiple queries by separating the queries with a semicolon.

The Presto CLI also supports executing commands and SQL queries in a file, like nations.sql:

USE tpch.sf1;
SELECT name FROM nation;

When you use the CLI with the -f option, it returns the data on the command line and then exits:

$ presto -f nations.sql
USE
"ALGERIA"
"ARGENTINA"
"BRAZIL"
"CANADA"
...

Output Formats

The Presto CLI provides the option --output-format to control how the output is displayed when running in noninteractive mode. The available options are ALIGNED, VERTICAL, CSV, TSV, CSV_HEADER, TSV_HEADER, and NULL. The default value is CSV.

Ignoring Errors

The Presto CLI provides the option --ignore-error, if you want to skip any errors that are encountered while executing the queries in a file. The default behavior is to stop execution of the script upon encountering the first error.

Presto JDBC Driver

Presto can be accessed from any Java application using a Java Database Connectivity (JDBC) driver. JDBC is a standard API that provides the necessary methods such as querying, inserting, deleting, and updating data in a relational database. Many client and server-side applications running on the JVM implement features for database management, reporting, and other aspects and are using JDBC to access the underlying database. All of these applications can use Presto with the JDBC driver.

The Presto JDBC driver allows you to connect to Presto and interact with Presto via SQL statements.

Note

If you’re familiar with the different implementations of JDBC drivers, the Presto JDBC driver is a Type 4 driver. This simply means it talks to the Presto native protocol.

Using the JDBC driver enables you to use powerful SQL client and database administration tools, such as the open source tools DBeaver or SQuirreL SQL Client and many others. Report generation, dashboard, and analytics tools using JDBC can also be used with Presto.

The steps to use any of these tools with Presto are similar:

Download the JDBC driver.
Make the JDBC driver available on the classpath of the application.
Configure the JDBC driver.
Configure the connection to Presto.
Connect to Presto and use it.

For example, the open source database management tool DBeaver makes this process simple. After installing and starting DBeaver, follow these simple steps:

From the File menu, select New.
From the DBeaver section, select Database Connection and then click Next.
Type prestosql in the input field, select the icon, and click Next.
Configure the connection to Presto and click Finish. Note that a username value is required. You can provide a random name on a default installation of Presto without authentication.

Now you see the connection in the Database Navigator on the left and can inspect the schemas and tables, with an example displayed in Figure 3-1. You can also start the SQL Editor and start writing your queries and working with Presto.

DBeaver user interface displaying tpch.sf1.customer table columns

SQuirreL SQL Client and many other tools use a similar process. Some steps, such as downloading the JDBC driver, and configuring the database driver and connection, are more manual. Let’s look at the details.

Downloading and Registering the Driver

The Presto JDBC driver is distributed on the Maven Central Repository. The server is available as a JAR file.

You can see the list of available versions at https://repo.maven.apache.org/maven2/io/prestosql/presto-jdbc.

Determine the largest number, which represents the latest release, and navigate into the folder and download the .jar file. You can also download the archive on the command line; for example, with wget for version 330:

$ wget https://repo.maven.apache.org/maven2/\
io/prestosql/presto-jdbc/330/presto-jdbc-330.jar

To use the Presto JDBC driver in your application, you add it to the classpath of the Java application. This differs for each application but often uses a folder named lib, as is the case for SQuirreL SQL Client. Some applications include a dialog to add libraries to the classpath, which can be used alternatively to copying the file into place manually.

Loading of the driver typically requires a restart of the application.

Now you are ready to register the driver. In SQuirreL SQL Client, you can do that with the + button to create a new driver in the Drivers tab on the left of the user interface.

When configuring the driver, you need to ensure that you configure the following parameters:

Class name: io.prestosql.jdbc.PrestoDriver
Example JDBC URL: jdbc:presto://host:port/catalog/schema
Name: Presto
Website: https://prestosql.io

Only the class name, JDBC URL, and the JAR on the classpath are truly required for the driver to operate. Other parameters are optional and depend on the application.

Establishing a Connection to Presto

With the driver registered and Presto up and running, you can now connect to it from your application.

In SQuirreL SQL Client, this connection configuration is called an alias. You can use the Alias tab on the left of the user interface and the + button to create a new alias with the following parameters:

Name: A descriptive name for the Presto connection. The name is more important if you end up connecting to multiple Presto instances, or different schemas and databases.
Driver: Select the Presto driver you created earlier.
URL

The JDBC driver can receive further parameters as properties. The mechanism for providing these values depends on the application. Both DBeaver and SQuirreL SQL Client include a user interface to specify properties as part of the connection configuration:

SSL: Enable SSL usage of the connection, true or false.
SSLTrustStorePath: Path to the SSL truststore.
SSLTrustStorePassword: Password for the SSL truststore.
user and password: Equivalent to the username and password parameters.
applicationNamePrefix: Property used to identify the application to Presto. This is used to set the source name for the Presto query. This name is displayed in the Presto Web UI so that administrators can see where the query originated. Furthermore, it can be used in conjunction with resource groups in which you can use the ApplicationName to decide how to assign resources in Presto. This is discussed in “Resource Groups”.

The full list of available parameters for the JDBC drivers can be found in the Presto documentation; see “Documentation”.

Once you have configured the connection, you can use it to connect to Presto. This enables you to query Presto itself and all configured schemas and databases. The specific features available for query execution or report generation or any other functionality depend on the application connected to Presto. Figure 3-2 shows a successful connection to Presto in SQuirreL SQL Client with some example queries and a result set.

SQuirreL SQL user interface displaying queries and result set

Presto and ODBC

Similar to the connection to Presto with the JDBC driver—“Presto JDBC Driver”—Open Database Connectivity (ODBC) allows any application supporting ODBC to use Presto. It provides an API for typically C-based applications.

Currently, no open source ODBC driver for Presto is available. However, commercial drivers can be purchased from Starburst and Simba.

This enables several popular applications from the database administration, business intelligence, and reporting and analytics space, such as Microsoft Power BI, Tableau, SAS, Quest Toad, and others. ODBC also enables Microsoft Excel usage.

Client Libraries

Besides the Presto CLI and the JDBC driver, maintained by the Presto team directly, numerous members of the larger Presto community have created client libraries for Presto.

You can find libraries for Python, C, Go, Node.js, R, Ruby, and others. A list is maintained on the Presto website discussed in “Website”.

These libraries can be used to integrate Presto with applications in these language ecosystems, including your own applications.

Presto Web UI

Every Presto server provides a web interface, commonly referred to as the Presto Web UI. The Presto Web UI, shown in Figure 3-3, exposes details about the Presto server and query processing on the server.

The Presto Web UI is accessible at the same address as the Presto server, using the same HTTP port number. By default, this port is 8080; for example, http://presto.example.com:8080. So on your local installation, you can check out the Web UI at http://localhost:8080.

The main dashboard shows details about the Presto utilization and a list of queries. Further details are available in the UI. All this information is of great value for operating Presto and managing the running queries.

Using the Web UI is very useful for monitoring Presto and tuning performance, as explained in more detail in “Monitoring with the Presto Web UI”. As a beginner user, it is mostly useful to confirm that the server is up and running and is processing your queries.

Presto Web UI display of high level information about the cluster

SQL with Presto

Presto is an ANSI SQL-compliant query engine. It allows you to query and manipulate data in any connected data source with the same SQL statements, functions, and operators.

Presto strives to be compliant with existing SQL standards. One of the main design principles of Presto is to neither invent another SQL-like query language nor deviate too much from the SQL standard. Every new functionality and language feature attempts to comply with the standard.

Extending the SQL feature set is considered only when the standard does not define an equivalent functionality. And even then, great care is taken to design the feature by considering similar features in the standard and other existing SQL implementations as a sign of what can become standard in the future.

Note

In rare cases, when the standard does not define an equivalent functionality, Presto extends the standard. A prominent example are Lambda expressions; see “Lambda Expressions”.

Presto does not define any particular SQL standard version it complies with. Instead, the standard is treated as a living document, and the newest standard version is always considered important. On the other hand, Presto does not yet implement all the mandatory features defined in the SQL standard. As a rule, if an existing functionality is found as noncompliant, it is deprecated and later replaced with a standard compliant one.

Querying Presto can be done with the Presto CLI as well as any database management tool connected with JDBC or ODBC, as discussed earlier.

Concepts

Presto enables SQL-based access to external data sources such as relational databases, key-value stores, and object storage. The following concepts are important to understand in Presto:

Connector: Adapts Presto to a data source. Every catalog is associated with a specific connector.
Catalog: Defines details for accessing a data source; contains schemas and configures a specific connector to use.
Schema: A way to organize tables. A catalog and schema together define a set of tables that can be queried.
Table: A set of unordered rows, which are organized into named columns with data types.

First Examples

This section presents a short overview of supported SQL and Presto statements, with much more detail available in Chapter 8 and Chapter 9.

Presto metadata is contained in the system catalog. Specific statements can be used to query that data and, in general, find out more about the available catalogs, schemas, information schemas, tables, functions, and more.

Use the following to list all catalogs:

SHOW CATALOGS;
 Catalog
 ---------
 system
 tpch
 (2 rows)

Show all schemas in the tpch catalog as follows:

SHOW SCHEMAS FROM tpch;
       Schema
 ------------------
 information_schema
 sf1
 sf100
 sf1000
 sf10000
 sf100000
 sf300
 sf3000
 sf30000
 tiny
(10 rows)

Here’s how to list the tables in the sf1 catalog:

SHOW TABLES FROM tpch.sf1;
  Table
 ---------
 customer
 lineitem
 nation
 orders
 part
 partsupp
 region
 supplier
(8 rows)

Find out about the data in the region table as follows:

DESCRIBE tpch.sf1.region;
  Column   |     Type     | Extra | Comment
-----------+--------------+-------+---------
 regionkey | bigint       |       |
 name      | varchar(25)  |       |
 comment   | varchar(152) |       |
(3 rows)

Other useful statements, such as USE and SHOW FUNCTIONS, are available. More information about the system catalog and Presto statements is available in “Presto Statements”.

With the knowledge of the available catalogs, schemas, and tables, you can use standard SQL to query the data.

You can check what regions are available:

SELECT name FROM tpch.sf1.region;
    name
 ------------
 AFRICA
 AMERICA
 ASIA
 EUROPE
 MIDDLE EAST
(5 rows)

You can return a subset and order the list:

 SELECT name
 FROM tpch.sf1.region
 WHERE name like 'A%'
 ORDER BY name DESC;
  name
 ---------
 ASIA
 AMERICA
 AFRICA
(3 rows)

Joining multiple tables and other parts of the SQL standard are supported as well:

SELECT nation.name AS nation, region.name AS region
FROM tpch.sf1.region, tpch.sf1.nation
WHERE region.regionkey = nation.regionkey
AND region.name LIKE 'AFRICA';
   nation   | region
------------+--------
 MOZAMBIQUE | AFRICA
 MOROCCO    | AFRICA
 KENYA      | AFRICA
 ETHIOPIA   | AFRICA
 ALGERIA    | AFRICA
(5 rows)

Presto supports operators like || for string concatenation. You can also use mathematical operators such as + and -.

You can change the preceding query to use JOIN and concatenate the result string to one field:

SELECT nation.name || ' / ' || region.name AS Location
FROM tpch.sf1.region JOIN tpch.sf1.nation
ON region.regionkey = nation.regionkey
AND region.name LIKE 'AFRICA';
      Location
 -------------------
 MOZAMBIQUE / AFRICA
 MOROCCO / AFRICA
 KENYA / AFRICA
 ETHIOPIA / AFRICA
 ALGERIA / AFRICA
(5 rows)

In addition to the operators, Presto supports a large variety of functions. They range from simple use cases to very complex functionality. You can display a list in Presto by using SHOW FUNCTIONS.

A simple example is to calculate the average prices of all orders and display the rounded integer value:

SELECT round(avg(totalprice)) AS average_price
FROM tpch.sf1.orders;
 average_price
 --------------
     151220.0
(1 row)

More details about SQL usage are available in the Presto documentation and in Chapter 8. Information about functions and operators is also available on the website, and you can find a good overview with more examples in Chapter 9.

Conclusion

Presto is up and running. You connected a data source and used SQL to query it. You can use the Presto CLI, or applications connected to Presto with JDBC.

With this powerful combination in place, you are ready to dive deeper. In the next chapters, we are going to do exactly that: learn how to install Presto for a larger production deployment, understand the architecture of Presto, and get into the details about SQL usage.

Note

Figure 3-1. DBeaver user interface displaying tpch.sf1.customer table columns

Downloading and Registering the Driver

Establishing a Connection to Presto

Figure 3-2. SQuirreL SQL Client user interface displaying queries and result set

Chapter 3. Using Presto

Presto Command-Line Interface

Getting Started

Pagination

History

Additional Diagnostics

Executing Queries

Output Formats

Ignoring Errors

Presto JDBC Driver

Presto and ODBC

Client Libraries

Presto Web UI

Figure 3-3. Presto Web UI display of high-level information about the cluster

SQL with Presto

Note

Concepts

First Examples

Conclusion