Designing Big Data Platforms, How to Use, Deploy and Maintain Big Data Systems by Aytas, Yusuf -- Read -- Imperial Library of Trantor

Index

Cover Table of Contents Title Page Copyright List of Contributors Preface Acknowledgments Acronyms Introduction 1 An Introduction: What's a Modern Big Data Platform

1.1 Defining Modern Big Data Platform 1.2 Fundamentals of a Modern Big Data Platform

2 A Bird's Eye View on Big Data

2.1 A Bit of History 2.2 What Makes Big Data 2.3 Components of Big Data Architecture 2.4 Making Use of Big Data

3 A Minimal Data Processing and Management System

3.1 Problem Definition 3.2 Processing Large Data with Linux Commands 3.3 Processing Large Data with PostgreSQL 3.4 Cost of Big Data

4 Big Data Storage

4.1 Big Data Storage Patterns 4.2 On‐Premise Storage Solutions 4.3 Cloud Storage Solutions 4.4 Hybrid Storage Solutions

5 Offline Big Data Processing

5.1 Defining Offline Data Processing 5.2 MapReduce Technologies 5.3 Apache Spark 5.4 Apache Flink 5.5 Presto

6 Stream Big Data Processing

6.1 The Need for Stream Processing 6.2 Defining Stream Data Processing 6.3 Streams via Message Brokers 6.4 Streams via Stream Engines

7 Data Analytics

7.1 Log Collection 7.2 Transferring Big Data Sets 7.3 Aggregating Big Data Sets 7.4 Data Pipeline Scheduler 7.5 Patterns and Practices 7.6 Exploring Data Visually

8 Data Science

8.1 Data Science Applications 8.2 Data Science Life Cycle 8.3 Data Science Toolbox 8.4 Productionalizing Data Science

9 Data Discovery

9.1 Need for Data Discovery 9.2 Data Governance 9.3 Data Discovery Tools

10 Data Security

10.1 Infrastructure Security 10.2 Data Privacy 10.3 Law Enforcement 10.4 Data Security Tools

11 Putting All Together

11.1 Platforms 11.2 Big Data Systems and Tools 11.3 Challenges

12 An Ideal Platform

12.1 Event Sourcing 12.2 Kappa Architecture 12.3 Data Mesh 12.4 Data Reservoirs 12.5 Data Catalog 12.6 Self‐service Platform 12.7 Abstraction 12.8 Data Guild 12.9 Trade‐offs 12.10 Data Ethics

Appendix A: Further Systems and Patterns

A.1 Lambda Architecture A.2 Apache Cassandra A.3 Apache Beam

Appendix B: Recipes

B.1 Activity Tracking Recipe B.2 Data Quality Assurance B.3 Estimating Time to Delivery B.4 Incident Response Recipe B.5 Leveraging Spark SQL Metrics B.6 Airbnb Price Prediction

Bibliography Index End User License Agreement

← Prev
Back
Next →

← Prev
Back
Next →