Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Foreword
Preface
About The Book
Conventions Used in This Book
Code Examples, Permissions and Attribution
O’Reilly Online Learning
How to Contact Us
Acknowledgments
I. Getting Started with Presto
1. Introducing Presto
The Problems with Big Data
Presto to the Rescue
Designed for Performance and Scale
SQL-on-Anything
Separation of Data Storage and Query Compute Resources
Presto Use Cases
One SQL Analytics Access Point
Access Point to Data Warehouse and Source Systems
Provide SQL-Based Access to Anything
Federated Queries
Semantic Layer for a Virtual Data Warehouse
Data Lake Query Engine
SQL Conversions and ETL
Better Insights Due to Faster Response Times
Big Data, Machine Learning and Artificial Intelligence
Other Use Cases
Presto Resources
Website
Documentation
Community Chat
Source Code, License and Version
Contributing
Book Repository
Iris Data Set
Flight Data Set
A Brief History of Presto
Conclusion
2. Installing and Configuring Presto
Trying Presto with the Docker Container
Installing from Archive File
Java Virtual Machine
Python
Installation
Configuration
Adding a Data Source
Running Presto
Conclusion
3. Using Presto
Presto Command Line Interface
Getting Started
Pagination
History
Additional Diagnostics
Executing Queries
Output Formats
Ignoring Errors
Presto JDBC Driver
Downloading and Registering the Driver
Establishing a Connection to Presto
Presto and ODBC
Client Libraries
Presto WebUI
SQL with Presto
Concepts
First Examples
Conclusion
II. Diving Deeper into Presto
4. Presto Architecture
Coordinator and Workers in a Cluster
Coordinator
Discovery Service
Workers
Connector-Based Architecture
Catalogs, Schemas and Tables
Query Execution Model
Query Planning
Parsing & Analysis
Initial Query Planning
Optimization Rules
Predicate Push Down
Cross Join Elimination
TopN
Partial Aggregations
Implementation Rules
Lateral Join Decorrelation
Semi-Join (IN) Decorrelation
Cost-Based Optimizer (CBO)
The Cost Concept
Cost of the Join
Table Statistics
Filter Statistics
Table Statistics for Partitioned Tables
Join Enumeration
Broadcast vs. Distributed Joins
Working with Table Statistics
Presto ANALYZE
Gathering Statistics When Writing to Disk
Hive Analyze
Displaying Table Statistics
Conclusion
5. Production-Ready Deployment
Configuration Details
Server Configuration
Logging
Node Configuration
JVM Configuration
Launcher
Cluster Installation
RPM Installation
Installation Directory Structure
Configuration
Uninstall Presto
Installation in the Cloud
Cluster Sizing Considerations
Conclusion
6. Connectors
Configuration
RDBMS Connector Example PostgreSQL
Query Pushdown
Parallelism and Concurrency
Other RDBMS Connectors
Security
Conclusion
Presto TPC-H and TPC-DS Connectors
Hive Connector for Distributed Storage Data Sources
Apache Hadoop and Hive
Hive Connector
Hive-Style Table Format
Managed and External Tables
Partitioned Data
Loading Data
File Formats and Compression
MinIO Example
Summary
Non-Relational Data Sources
Presto JMX Connector
Black Hole Connector
Memory Connector
Other Connectors
Conclusion
7. Advanced Connector Examples
Connecting to HBase with Phoenix
Key Value Store Connector Example Accumulo
Using the Presto Accumulo Connector
Predicate Pushdown in Accumulo
Apache Cassandra Connector
Streaming System Connector Example Kafka
Document Store Connector Example Elasticsearch
Overview
Configuration and Usage
Query Processing
Full Text Search
Conclusion
Query Federation in Presto
Extract, Transform, Load and Federated Queries
Conclusion
8. Using SQL in Presto
Presto Statements
Presto System Tables
Catalogs
Schemas
Information Schema
Tables
Table and Column Properties
Copying an Existing Table
Creating a New Table from Query Results
Modifying a Table
Deleting a Table
Table Limitations from Connectors
Views
Session Information and Configuration
Data Types
Collection Data Types
Temporal Data Types
Type Casting
SELECT Statement Basics
WHERE Clause
GROUP BY and HAVING Clauses
ORDER BY and LIMIT Clauses
JOIN Statements
UNION, INTERSECT, and EXCEPT Clauses
Grouping Operations
WITH Clause
Subqueries
Scalar Subquery
EXISTS Subquery
Quantified Subquery
Deleting Data From A Table
Conclusion
9. Advanced SQL
Functions and Operators Introduction
Scalar Functions and Operators
Boolean Operators
Logical Operators
Range Selection with the BETWEEN Statement
Value Detection with IS [NOT] NULL
Mathematical Functions and Operators
Trigonometric Functions
Constant and Random Functions
String Functions and Operators
Strings and Maps
Unicode
Regular Expressions
Unnesting Complex Data Types
JSON Functions
Date and Time Functions and Operators
Histograms
Aggregate Functions
Map Aggregate Functions
Approximate Aggregate Functions
Window Functions
Lambda Expressions
Geospatial Functions
Prepared Statements
Conclusion
III. Presto in Real World Usage
10. Security
Authentication
Password and LDAP Authentication
Authorization
System Access Control
Connector Access Control
Encryption
Encrypting Presto Client to Coordinator Communication
Creating Java Keystores and Java Truststores
Encrypting Communication within the Presto Cluster
Certificate Authority vs. Self-Signed Certificates
Certificate Authentication
Kerberos
Prerequisites
Kerberos Client Authentication
Cluster Internal Kerberos
Data Source Access and Configuration for Security
Kerberos Authentication with the Hive Connector
Hive Metastore Thrift Service Authentication
HDFS Authentication
Cluster Separation
Conclusion
11. Integrating Presto with Other Tools
Queries, Visualizations and More with Apache Superset
Performance Improvements with Rubix
Workflows with Apache Airflow
Embedded Presto Example Amazon Athena
Starburst Enterprise Presto
Other Integration Examples
Custom Integrations
Conclusion
12. Presto in Production
Monitoring with the Presto WebUI
Cluster Level Details
Query Level Details
Query Detail View
Tuning Presto SQL Queries
Memory Management
Task Concurrency
Worker Scheduling
Scheduling Splits per Task and per Node
Local Scheduling
Network Data Exchange
Concurrency
Buffer Sizes
Tuning Java Virtual Machine
Resource Groups
Resource Group Definition
Scheduling Policy
Selector Rules Definition
Conclusion
13. Real World Examples
Deployment and Runtime Platforms
Cluster Sizing
Hadoop/Hive Migration Use Case
Other Data Sources
Users and Traffic
Conclusion
14. Conclusion
Index
← Prev
Back
Next →
← Prev
Back
Next →