Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Foreword Preface
About The Book Conventions Used in This Book Code Examples, Permissions and Attribution O’Reilly Online Learning How to Contact Us Acknowledgments
I. Getting Started with Presto 1. Introducing Presto
The Problems with Big Data Presto to the Rescue
Designed for Performance and Scale SQL-on-Anything Separation of Data Storage and Query Compute Resources
Presto Use Cases
One SQL Analytics Access Point Access Point to Data Warehouse and Source Systems Provide SQL-Based Access to Anything Federated Queries Semantic Layer for a Virtual Data Warehouse Data Lake Query Engine SQL Conversions and ETL Better Insights Due to Faster Response Times Big Data, Machine Learning and Artificial Intelligence Other Use Cases
Presto Resources
Website Documentation Community Chat Source Code, License and Version Contributing Book Repository Iris Data Set Flight Data Set
A Brief History of Presto Conclusion
2. Installing and Configuring Presto
Trying Presto with the Docker Container Installing from Archive File
Java Virtual Machine Python Installation Configuration
Adding a Data Source Running Presto Conclusion
3. Using Presto
Presto Command Line Interface
Getting Started Pagination History Additional Diagnostics Executing Queries Output Formats Ignoring Errors
Presto JDBC Driver
Downloading and Registering the Driver Establishing a Connection to Presto
Presto and ODBC Client Libraries Presto WebUI SQL with Presto
Concepts First Examples
Conclusion
II. Diving Deeper into Presto 4. Presto Architecture
Coordinator and Workers in a Cluster Coordinator Discovery Service Workers Connector-Based Architecture Catalogs, Schemas and Tables Query Execution Model Query Planning
Parsing & Analysis Initial Query Planning
Optimization Rules
Predicate Push Down Cross Join Elimination TopN Partial Aggregations
Implementation Rules
Lateral Join Decorrelation Semi-Join (IN) Decorrelation
Cost-Based Optimizer (CBO)
The Cost Concept Cost of the Join Table Statistics Filter Statistics Table Statistics for Partitioned Tables Join Enumeration Broadcast vs. Distributed Joins
Working with Table Statistics
Presto ANALYZE Gathering Statistics When Writing to Disk Hive Analyze Displaying Table Statistics
Conclusion
5. Production-Ready Deployment
Configuration Details Server Configuration Logging Node Configuration JVM Configuration Launcher Cluster Installation RPM Installation
Installation Directory Structure Configuration Uninstall Presto
Installation in the Cloud Cluster Sizing Considerations Conclusion
6. Connectors
Configuration RDBMS Connector Example PostgreSQL
Query Pushdown Parallelism and Concurrency Other RDBMS Connectors Security Conclusion
Presto TPC-H and TPC-DS Connectors Hive Connector for Distributed Storage Data Sources
Apache Hadoop and Hive Hive Connector Hive-Style Table Format Managed and External Tables Partitioned Data Loading Data File Formats and Compression MinIO Example Summary
Non-Relational Data Sources Presto JMX Connector Black Hole Connector Memory Connector Other Connectors Conclusion
7. Advanced Connector Examples
Connecting to HBase with Phoenix Key Value Store Connector Example Accumulo
Using the Presto Accumulo Connector Predicate Pushdown in Accumulo
Apache Cassandra Connector Streaming System Connector Example Kafka Document Store Connector Example Elasticsearch
Overview Configuration and Usage Query Processing Full Text Search Conclusion
Query Federation in Presto Extract, Transform, Load and Federated Queries Conclusion
8. Using SQL in Presto
Presto Statements Presto System Tables Catalogs Schemas Information Schema Tables
Table and Column Properties Copying an Existing Table Creating a New Table from Query Results Modifying a Table Deleting a Table Table Limitations from Connectors
Views Session Information and Configuration Data Types
Collection Data Types Temporal Data Types Type Casting
SELECT Statement Basics WHERE Clause GROUP BY and HAVING Clauses ORDER BY and LIMIT Clauses JOIN Statements UNION, INTERSECT, and EXCEPT Clauses Grouping Operations WITH Clause Subqueries
Scalar Subquery EXISTS Subquery Quantified Subquery
Deleting Data From A Table Conclusion
9. Advanced SQL
Functions and Operators Introduction Scalar Functions and Operators Boolean Operators Logical Operators Range Selection with the BETWEEN Statement Value Detection with IS [NOT] NULL Mathematical Functions and Operators Trigonometric Functions Constant and Random Functions String Functions and Operators Strings and Maps Unicode Regular Expressions Unnesting Complex Data Types JSON Functions Date and Time Functions and Operators Histograms Aggregate Functions Map Aggregate Functions Approximate Aggregate Functions Window Functions Lambda Expressions Geospatial Functions Prepared Statements Conclusion
III. Presto in Real World Usage 10. Security
Authentication
Password and LDAP Authentication
Authorization
System Access Control Connector Access Control
Encryption
Encrypting Presto Client to Coordinator Communication Creating Java Keystores and Java Truststores Encrypting Communication within the Presto Cluster
Certificate Authority vs. Self-Signed Certificates Certificate Authentication Kerberos
Prerequisites Kerberos Client Authentication Cluster Internal Kerberos
Data Source Access and Configuration for Security Kerberos Authentication with the Hive Connector
Hive Metastore Thrift Service Authentication HDFS Authentication
Cluster Separation Conclusion
11. Integrating Presto with Other Tools
Queries, Visualizations and More with Apache Superset Performance Improvements with Rubix Workflows with Apache Airflow Embedded Presto Example Amazon Athena Starburst Enterprise Presto Other Integration Examples Custom Integrations Conclusion
12. Presto in Production
Monitoring with the Presto WebUI
Cluster Level Details Query Level Details Query Detail View
Tuning Presto SQL Queries Memory Management Task Concurrency Worker Scheduling
Scheduling Splits per Task and per Node Local Scheduling
Network Data Exchange
Concurrency Buffer Sizes
Tuning Java Virtual Machine Resource Groups
Resource Group Definition Scheduling Policy Selector Rules Definition
Conclusion
13. Real World Examples
Deployment and Runtime Platforms Cluster Sizing Hadoop/Hive Migration Use Case Other Data Sources Users and Traffic Conclusion
14. Conclusion Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion