Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Cover
Title Page
Copyright
Dedication
About the Authors
Credits
Acknowledgments
Introduction
The Origins of Kettle
About This Book
How This Book Is Organized
Prerequisites
On the Website
Further Resources
Part I: Getting Started
Chapter 1: ETL Primer
OLTP versus Data Warehousing
What Is ETL?
ETL, ELT, and EII
Data Integration Challenges
ETL Tool Requirements
Summary
Chapter 2: Kettle Concepts
Design Principles
The Building Blocks of Kettle Design
Parameters and Variables
Visual Programming
Summary
Chapter 3: Installation and Configuration
Kettle Software Overview
Installation
Configuration
Summary
Chapter 4: An Example ETL Solution—Sakila
Sakila
Prerequisites and Some Basic Spoon Skills
The Sample ETL Solution
Summary
Part II: ETL
Chapter 5: ETL Subsystems
Introduction to the 34 Subsystems
Summary
Chapter 6: Data Extraction
Kettle Data Extraction Overview
Working with ERP and CRM Systems
Data Profiling
CDC: Change Data Capture
Delivering Data
Summary
Chapter 7: Cleansing and Conforming
Data Cleansing
Error Handling
Auditing Data and Process Quality
Deduplicating Data
Scripting
Summary
Chapter 8: Handling Dimension Tables
Managing Keys
Loading Dimension Tables
Slowly Changing Dimensions
More Dimensions
Summary
Chapter 9: Loading Fact Tables
Loading in Bulk
Dimension Lookups
Fact Table Handling
Summary
Chapter 10: Working with OLAP Data
OLAP Benefits and Challenges
Working with Mondrian
Working with XML/A Servers
Working with Palo
Summary
Part III: Management and Deployment
Chapter 11: ETL Development Lifecycle
Solution Design
Agile Development
Testing and Debugging
Documenting the Solution
Summary
Chapter 12: Scheduling and Monitoring
Scheduling
Monitoring
Summary
Chapter 13: Versioning and Migration
Version Control Systems
Kettle Metadata
Managing Repositories
Version Migration System
Summary
Chapter 14: Lineage and Auditing
Batch-Level Lineage Extraction
Lineage
Logging and Operational Metadata
Summary
Part IV: Performance and Scalability
Chapter 15: Performance Tuning
Transformation Performance: Finding the Weakest Link
Improving Transformation Performance
Improving Job Performance
Summary
Chapter 16: Parallelization, Clustering, and Partitioning
Multi-Threading
Using Carte as a Slave Server
Clustering Transformations
Partitioning
Summary
Chapter 17: Dynamic Clustering in the Cloud
Dynamic Clustering
Cloud Computing
EC2
Summary
Chapter 18: Real-Time Data Integration
Introduction to Real-Time ETL
Transformation Streaming
Summary
Part V: Advanced Topics
Chapter 19: Data Vault Management
Introduction to Data Vault Modeling
Do You Need a Data Vault?
Data Vault Building Blocks
Transforming Sakila to the Data Vault Model
Loading the Data Vault: A Sample ETL Solution
Updating a Data Mart from a Data Vault
Summary
Chapter 20: Handling Complex Data Formats
Non-Relational and Non-Tabular Data Formats
Non-Relational Tabular Formats
Semi- and Unstructured Data
Key/Value Pairs
Summary
Chapter 21: Web Services
Web Pages and Web Services
Data Formats
XML Examples
SOAP Examples
JSON Example
RSS
Summary
Chapter 22: Kettle Integration
The Kettle API
Executing Existing Transformations and Jobs
Embedding Kettle
OEM Versions and Forks
Summary
Chapter 23: Extending Kettle
Plugin Architecture Overview
Transformation Step Plugins
The User-Defined Java Class Step
Job Entry Plugins
Partitioning Method Plugins
Repository Type Plugins
Database Type Plugins
Summary
Appendix A: The Kettle Ecosystem
Kettle Development and Versions
The Pentaho Community Wiki
Using the Forums
Jira
##pentaho
Appendix B: Kettle Enterprise Edition Features
Appendix C: Built-in Variables and Properties Reference
Internal Variables
Kettle Variables
Variables for Configuring VFS
Noteworthy JRE Variables
Index
← Prev
Back
Next →
← Prev
Back
Next →