Chapter 3. Using Presto

Congratulations! In the prior chapters, you were introduced to Presto and learned how to get it installed, configured, and started. Now you get to use it.

Presto Command-Line Interface

The Presto command-line interface (CLI) provides a terminal-based, interactive shell for running queries and for interacting with the Presto server to inspect metadata about it.

Getting Started

Just like the Presto server itself, the Presto CLI release binaries are distributed on the Maven Central Repository. The CLI application is available as an executable JAR file, which allows you to use it like a normal Unix executable.

You can see the list of available versions at https://repo.maven.apache.org/maven2/io/prestosql/presto-cli.

Locate the version of the CLI that is identical to the Presto server version you are running. Download the *-executable.jar file from the versioned directory, and rename it to presto; for example, with wget and version 330:

$ wget -O presto \
https://repo.maven.apache.org/maven2/\
io/prestosql/presto-cli/330/presto-cli-330-executable.jar

Ensure that the file is set to be executable. For convenience, make it available on the PATH; for example, by copying it to ~/bin and adding that folder to the PATH:

$ chmod +x presto
$ mv presto ~/bin
$ export PATH=~/bin/:$PATH

You can now run the Presto CLI and confirm the version:

$ presto --version
Presto CLI 330

Documentation for all available options and commands is available with the help option:

$ presto --help

Before you start using the CLI, you need to determine which Presto server you want to interact with. By default, the CLI connects to the Presto server running on http://localhost:8080. If your server is running locally for testing or development, or you access the server with SSH, or you’re using the Docker container with exec and the CLI is installed there, you are ready to go:

$ presto
presto>

If Presto is running at a different server, you have to specify the URL:

$ presto --server https://presto.example.com:8080
presto>

The presto> prompt shows that you are using the interactive console accessing the Presto server. Type help to get a list of available commands:

presto> help
Supported commands:
QUIT
EXPLAIN [ ( option [, ...] ) ] <query>
    options: FORMAT { TEXT | GRAPHVIZ | JSON }
             TYPE { LOGICAL | DISTRIBUTED | VALIDATE | IO }
DESCRIBE <table>
SHOW COLUMNS FROM <table>
SHOW FUNCTIONS
SHOW CATALOGS [LIKE <pattern>]
SHOW SCHEMAS [FROM <catalog>] [LIKE <pattern>]
SHOW TABLES [FROM <schema>] [LIKE <pattern>]
USE [<catalog>.]<schema>

Most commands, and especially all SQL statements, in the CLI need to end with a semicolon. You can find much more information about SQL on Presto in “SQL with Presto”. For now, you can just explore a few simple things to get started.

First, you can check what data sources are configured as catalogs. At a minimum, you find the internal metadata catalog—system. In our case, you also find the tpch catalog:

presto> SHOW CATALOGS;
 Catalog
---------
 system
 tpch
(2 rows)

Query 20191212_185850_00001_etmtk, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

You can easily display available schemas as well as tables in the schemas. Each time you query Presto, query processing statistics are returned, together with the result. You see them just as in the preceding code. We are going to omit them in the following listings:

presto> SHOW SCHEMAS FROM tpch;
       Schema
--------------------
 information_schema
 sf1
 sf100
 sf1000
 sf10000
 sf100000
 sf300
 sf3000
 sf30000
 tiny
(10 rows)

presto> SHOW TABLES FROM tpch.sf1;
  Table
----------
 customer
 lineitem
 nation
 orders
 part
 partsupp
 region
 supplier
(8 rows)

Now you are ready to query some actual data:

presto> SELECT count(name) FROM tpch.sf1.nation;
 _col0
-------
    25
(1 row)

Alternatively, you can select a schema to work with, and then omit the qualifier from the query:

presto> USE tpch.sf1;
USE
presto:sf1> SELECT count(name) FROM nation:

If you know that you are going to work with a specific schema, before you start the CLI, you can specify it at startup:

$ presto --catalog tpch --schema sf1

Now you are ready to exercise all your SQL knowledge and the power of Presto to query the configured data sources.

To exit out of the CLI, you can simply type quit or exit, or press Ctrl-D.

Presto JDBC Driver

Presto can be accessed from any Java application using a Java Database Connectivity (JDBC) driver. JDBC is a standard API that provides the necessary methods such as querying, inserting, deleting, and updating data in a relational database. Many client and server-side applications running on the JVM implement features for database management, reporting, and other aspects and are using JDBC to access the underlying database. All of these applications can use Presto with the JDBC driver.

The Presto JDBC driver allows you to connect to Presto and interact with Presto via SQL statements.

Note

If you’re familiar with the different implementations of JDBC drivers, the Presto JDBC driver is a Type 4 driver. This simply means it talks to the Presto native protocol.

Using the JDBC driver enables you to use powerful SQL client and database administration tools, such as the open source tools DBeaver or SQuirreL SQL Client and many others. Report generation, dashboard, and analytics tools using JDBC can also be used with Presto.

The steps to use any of these tools with Presto are similar:

  1. Download the JDBC driver.

  2. Make the JDBC driver available on the classpath of the application.

  3. Configure the JDBC driver.

  4. Configure the connection to Presto.

  5. Connect to Presto and use it.

For example, the open source database management tool DBeaver makes this process simple. After installing and starting DBeaver, follow these simple steps:

  1. From the File menu, select New.

  2. From the DBeaver section, select Database Connection and then click Next.

  3. Type prestosql in the input field, select the icon, and click Next.

  4. Configure the connection to Presto and click Finish. Note that a username value is required. You can provide a random name on a default installation of Presto without authentication.

Now you see the connection in the Database Navigator on the left and can inspect the schemas and tables, with an example displayed in Figure 3-1. You can also start the SQL Editor and start writing your queries and working with Presto.

DBeaver user interface displaying tpch.sf1.customer table columns
Figure 3-1. DBeaver user interface displaying tpch.sf1.customer table columns

SQuirreL SQL Client and many other tools use a similar process. Some steps, such as downloading the JDBC driver, and configuring the database driver and connection, are more manual. Let’s look at the details.

Downloading and Registering the Driver

The Presto JDBC driver is distributed on the Maven Central Repository. The server is available as a JAR file.

You can see the list of available versions at https://repo.maven.apache.org/maven2/io/prestosql/presto-jdbc.

Determine the largest number, which represents the latest release, and navigate into the folder and download the .jar file. You can also download the archive on the command line; for example, with wget for version 330:

$ wget https://repo.maven.apache.org/maven2/\
io/prestosql/presto-jdbc/330/presto-jdbc-330.jar

To use the Presto JDBC driver in your application, you add it to the classpath of the Java application. This differs for each application but often uses a folder named lib, as is the case for SQuirreL SQL Client. Some applications include a dialog to add libraries to the classpath, which can be used alternatively to copying the file into place manually.

Loading of the driver typically requires a restart of the application.

Now you are ready to register the driver. In SQuirreL SQL Client, you can do that with the + button to create a new driver in the Drivers tab on the left of the user interface.

When configuring the driver, you need to ensure that you configure the following parameters:

  • Class name: io.prestosql.jdbc.PrestoDriver

  • Example JDBC URL: jdbc:presto://host:port/catalog/schema

  • Name: Presto

  • Website: https://prestosql.io

Only the class name, JDBC URL, and the JAR on the classpath are truly required for the driver to operate. Other parameters are optional and depend on the application.

Establishing a Connection to Presto

With the driver registered and Presto up and running, you can now connect to it from your application.

In SQuirreL SQL Client, this connection configuration is called an alias. You can use the Alias tab on the left of the user interface and the + button to create a new alias with the following parameters:

Name

A descriptive name for the Presto connection. The name is more important if you end up connecting to multiple Presto instances, or different schemas and databases.

Driver

Select the Presto driver you created earlier.

URL

The JDBC URL uses the pattern jdbc:presto://host:port/catalog/schema, with catalog and schema values optional. You can connect to Presto, installed earlier on your machine and running on http://localhost:8080, with the JDBC URL jdbc:presto://localhost:8080. The host parameter is the host where the Presto coordinator is running. It is the same hostname you use when connecting via the Presto CLI. This can be in the form of an IP address or DNS hostname. The port parameter is the HTTP port to connect to Presto on the host. The optional catalog and schema parameters are used to establish a connection by using the catalog and schema specified. When you specify these, you do not have to fully qualify the table names in the queries.

Username

A username value is required, even when no authentication is configured on Presto. This allows Presto to report the initiator for any queries.

Password

The password is associated with the user and used for authentication. No password is required for a default installation of Presto without configured authentication.

The JDBC driver can receive further parameters as properties. The mechanism for providing these values depends on the application. Both DBeaver and SQuirreL SQL Client include a user interface to specify properties as part of the connection configuration:

SSL

Enable SSL usage of the connection, true or false.

SSLTrustStorePath

Path to the SSL truststore.

SSLTrustStorePassword

Password for the SSL truststore.

user and password

Equivalent to the username and password parameters.

applicationNamePrefix

Property used to identify the application to Presto. This is used to set the source name for the Presto query. This name is displayed in the Presto Web UI so that administrators can see where the query originated. Furthermore, it can be used in conjunction with resource groups in which you can use the ApplicationName to decide how to assign resources in Presto. This is discussed in “Resource Groups”.

The full list of available parameters for the JDBC drivers can be found in the Presto documentation; see “Documentation”.

Once you have configured the connection, you can use it to connect to Presto. This enables you to query Presto itself and all configured schemas and databases. The specific features available for query execution or report generation or any other functionality depend on the application connected to Presto. Figure 3-2 shows a successful connection to Presto in SQuirreL SQL Client with some example queries and a result set.

SQuirreL SQL user interface displaying queries and result set
Figure 3-2. SQuirreL SQL Client user interface displaying queries and result set

Presto and ODBC

Similar to the connection to Presto with the JDBC driver—“Presto JDBC Driver”—Open Database Connectivity (ODBC) allows any application supporting ODBC to use Presto. It provides an API for typically C-based applications.

Currently, no open source ODBC driver for Presto is available. However, commercial drivers can be purchased from Starburst and Simba.

This enables several popular applications from the database administration, business intelligence, and reporting and analytics space, such as Microsoft Power BI, Tableau, SAS, Quest Toad, and others. ODBC also enables Microsoft Excel usage.

SQL with Presto

Presto is an ANSI SQL-compliant query engine. It allows you to query and manipulate data in any connected data source with the same SQL statements, functions, and operators.

Presto strives to be compliant with existing SQL standards. One of the main design principles of Presto is to neither invent another SQL-like query language nor deviate too much from the SQL standard. Every new functionality and language feature attempts to comply with the standard.

Extending the SQL feature set is considered only when the standard does not define an equivalent functionality. And even then, great care is taken to design the feature by considering similar features in the standard and other existing SQL implementations as a sign of what can become standard in the future.

Note

In rare cases, when the standard does not define an equivalent functionality, Presto extends the standard. A prominent example are Lambda expressions; see “Lambda Expressions”.

Presto does not define any particular SQL standard version it complies with. Instead, the standard is treated as a living document, and the newest standard version is always considered important. On the other hand, Presto does not yet implement all the mandatory features defined in the SQL standard. As a rule, if an existing functionality is found as noncompliant, it is deprecated and later replaced with a standard compliant one.

Querying Presto can be done with the Presto CLI as well as any database management tool connected with JDBC or ODBC, as discussed earlier.

First Examples

This section presents a short overview of supported SQL and Presto statements, with much more detail available in Chapter 8 and Chapter 9.

Presto metadata is contained in the system catalog. Specific statements can be used to query that data and, in general, find out more about the available catalogs, schemas, information schemas, tables, functions, and more.

Use the following to list all catalogs:

SHOW CATALOGS;
 Catalog
 ---------
 system
 tpch
 (2 rows)

Show all schemas in the tpch catalog as follows:

SHOW SCHEMAS FROM tpch;
       Schema
 ------------------
 information_schema
 sf1
 sf100
 sf1000
 sf10000
 sf100000
 sf300
 sf3000
 sf30000
 tiny
(10 rows)

Here’s how to list the tables in the sf1 catalog:

SHOW TABLES FROM tpch.sf1;
  Table
 ---------
 customer
 lineitem
 nation
 orders
 part
 partsupp
 region
 supplier
(8 rows)

Find out about the data in the region table as follows:

DESCRIBE tpch.sf1.region;
  Column   |     Type     | Extra | Comment
-----------+--------------+-------+---------
 regionkey | bigint       |       |
 name      | varchar(25)  |       |
 comment   | varchar(152) |       |
(3 rows)

Other useful statements, such as USE and SHOW FUNCTIONS, are available. More information about the system catalog and Presto statements is available in “Presto Statements”.

With the knowledge of the available catalogs, schemas, and tables, you can use standard SQL to query the data.

You can check what regions are available:

SELECT name FROM tpch.sf1.region;
    name
 ------------
 AFRICA
 AMERICA
 ASIA
 EUROPE
 MIDDLE EAST
(5 rows)

You can return a subset and order the list:

 SELECT name
 FROM tpch.sf1.region
 WHERE name like 'A%'
 ORDER BY name DESC;
  name
 ---------
 ASIA
 AMERICA
 AFRICA
(3 rows)

Joining multiple tables and other parts of the SQL standard are supported as well:

SELECT nation.name AS nation, region.name AS region
FROM tpch.sf1.region, tpch.sf1.nation
WHERE region.regionkey = nation.regionkey
AND region.name LIKE 'AFRICA';
   nation   | region
------------+--------
 MOZAMBIQUE | AFRICA
 MOROCCO    | AFRICA
 KENYA      | AFRICA
 ETHIOPIA   | AFRICA
 ALGERIA    | AFRICA
(5 rows)

Presto supports operators like || for string concatenation. You can also use mathematical operators such as + and -.

You can change the preceding query to use JOIN and concatenate the result string to one field:

SELECT nation.name || ' / ' || region.name AS Location
FROM tpch.sf1.region JOIN tpch.sf1.nation
ON region.regionkey = nation.regionkey
AND region.name LIKE 'AFRICA';
      Location
 -------------------
 MOZAMBIQUE / AFRICA
 MOROCCO / AFRICA
 KENYA / AFRICA
 ETHIOPIA / AFRICA
 ALGERIA / AFRICA
(5 rows)

In addition to the operators, Presto supports a large variety of functions. They range from simple use cases to very complex functionality. You can display a list in Presto by using SHOW FUNCTIONS.

A simple example is to calculate the average prices of all orders and display the rounded integer value:

SELECT round(avg(totalprice)) AS average_price
FROM tpch.sf1.orders;
 average_price
 --------------
     151220.0
(1 row)

More details about SQL usage are available in the Presto documentation and in Chapter 8. Information about functions and operators is also available on the website, and you can find a good overview with more examples in Chapter 9.

Conclusion

Presto is up and running. You connected a data source and used SQL to query it. You can use the Presto CLI, or applications connected to Presto with JDBC.

With this powerful combination in place, you are ready to dive deeper. In the next chapters, we are going to do exactly that: learn how to install Presto for a larger production deployment, understand the architecture of Presto, and get into the details about SQL usage.