Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Data Engineering with Python Why subscribe? Contributors About the author About the reviewers Packt is searching for authors like you Preface
Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews
Section 1: Building Data Pipelines – Extract Transform, and Load Chapter 1: What is Data Engineering?
What data engineers do
Required skills and knowledge to be a data engineer
Data engineering versus data science Data engineering tools
Programming languages Databases Data processing engines Data pipelines
Summary
Chapter 2: Building Our Data Engineering Infrastructure
Installing and configuring Apache NiFi
A quick tour of NiFi PostgreSQL driver
Installing and configuring Apache Airflow Installing and configuring Elasticsearch Installing and configuring Kibana Installing and configuring PostgreSQL Installing pgAdmin 4
A tour of pgAdmin 4
Summary
Chapter 3: Reading and Writing Files
Writing and reading files in Python
Writing and reading CSVs Reading and writing CSVs using pandas DataFrames Writing JSON with Python
Building data pipelines in Apache Airflow Handling files using NiFi processors
Working with CSV in NiFi Working with JSON in NiFi
Summary
Chapter 4: Working with Databases
Inserting and extracting relational data in Python
Inserting data into PostgreSQL
Inserting and extracting NoSQL database data in Python
Installing Elasticsearch Inserting data into Elasticsearch
Building data pipelines in Apache Airflow
Setting up the Airflow boilerplate Running the DAG
Handling databases with NiFi processors
Extracting data from PostgreSQL Running the data pipeline
Summary
Chapter 5: Cleaning, Transforming, and Enriching Data
Performing exploratory data analysis in Python
Downloading the data Basic data exploration
Handling common data issues using pandas
Drop rows and columns Creating and modifying columns Enriching data
Cleaning data using Airflow Summary
Chapter 6: Building a 311 Data Pipeline
Building the data pipeline
Mapping a data type Triggering a pipeline Querying SeeClickFix Transforming the data for Elasticsearch Getting every page Backfilling data
Building a Kibana dashboard
Creating visualizations Creating a dashboard
Summary
Section 2:Deploying Data Pipelines in Production Chapter 7: Features of a Production Pipeline
Staging and validating data
Staging data Validating data with Great Expectations
Building idempotent data pipelines Building atomic data pipelines Summary
Chapter 8: Version Control with the NiFi Registry
Installing and configuring the NiFi Registry
Installing the NiFi Registry Configuring the NiFi Registry
Using the Registry in NiFi
Adding the Registry to NiFi
Versioning your data pipelines Using git-persistence with the NiFi Registry Summary
Chapter 9: Monitoring Data Pipelines
Monitoring NiFi using the GUI
Monitoring NiFi with the status bar
Monitoring NiFi with processors Using Python with the NiFi REST API Summary
Chapter 10: Deploying Data Pipelines
Finalizing your data pipelines for production
Backpressure Improving processor groups
Using the NiFi variable registry Deploying your data pipelines
Using the simplest strategy Using the middle strategy Using multiple registries
Summary
Chapter 11: Building a Production Data Pipeline
Creating a test and production environment
Creating the databases Populating a data lake
Building a production data pipeline
Reading the data lake Scanning the data lake Inserting the data into staging Querying the staging database Validating the staging data Insert Warehouse
Deploying a data pipeline in production Summary
Section 3:Beyond Batch – Building Real-Time Data Pipelines Chapter 12: Building a Kafka Cluster
Creating ZooKeeper and Kafka clusters
Downloading Kafka and setting up the environment Configuring ZooKeeper and Kafka Starting the ZooKeeper and Kafka clusters
Testing the Kafka cluster
Testing the cluster with messages
Summary
Chapter 13: Streaming Data with Apache Kafka
Understanding logs Understanding how Kafka uses logs
Topics Kafka producers and consumers
Building data pipelines with Kafka and NiFi
The Kafka producer The Kafka consumer
Differentiating stream processing from batch processing Producing and consuming with Python
Writing a Kafka producer in Python Writing a Kafka consumer in Python
Summary
Chapter 14: Data Processing with Apache Spark
Installing and running Spark Installing and configuring PySpark Processing data with PySpark
Spark for data engineering
Summary
Chapter 15: Real-Time Edge Data with MiNiFi, Kafka, and Spark
Setting up MiNiFi Building a MiNiFi task in NiFi Summary
Appendix
Building a NiFi cluster The basics of NiFi clustering Building a NiFi cluster Building a distributed data pipeline Managing the distributed data pipeline Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion