NoSQL for Mere Mortals by Sullivan, Dan -- Read -- Imperial Library of Trantor

Index

About This eBook Title Page Copyright Page Dedication Page About the Author Contents Preface Acknowledgments Introduction

Who Should Read This Book? The Purpose of This Book How to Read This Book How This Book Is Organized

Part I: Introduction Part II: Key-Value Databases Part III: Document Databases Part IV: Column Family Databases Part V: Graph Databases Part VI: Choosing a Database for Your Application Part VII: Appendices

Part I: Introduction

1. Different Databases for Different Requirements

Relational Database Design

E-commerce Application

Early Database Management Systems

Flat File Data Management Systems Hierarchical Data Model Systems Network Data Management Systems Summary of Early Database Management Systems

The Relational Database Revolution

Relational Database Management Systems

Motivations for Not Just/No SQL (NoSQL) Databases

Scalability Cost Flexibility Availability

Summary Case Study Review Questions References Bibliography

2. Variety of NoSQL Databases

Data Management with Distributed Databases

Store Data Persistently Maintain Data Consistency Ensure Data Availability Balancing Response Times, Consistency, and Durability Consistency, Availability, and Partitioning: The CAP Theorem

ACID and BASE

ACID: Atomicity, Consistency, Isolation, and Durability BASE: Basically Available, Soft State, Eventually Consistent Types of Eventual Consistency

Four Types of NoSQL Databases

Key-Value Pair Databases Document Databases Column Family Databases Graph Databases

Summary Review Questions References Bibliography

Part II: Key-Value Databases

3. Introduction to Key-Value Databases

From Arrays to Key-Value Databases

Arrays: Key Value Stores with Training Wheels Associative Arrays: Taking Off the Training Wheels Caches: Adding Gears to the Bike In-Memory and On-Disk Key-Value Database: From Bikes to Motorized Vehicles

Essential Features of Key-Value Databases

Simplicity: Who Needs Complicated Data Models Anyway? Speed: There Is No Such Thing as Too Fast Scalability: Keeping Up with the Rush

Keys: More Than Meaningless Identifiers

How to Construct a Key Using Keys to Locate Values

Values: Storing Just About Any Data You Want

Values Do Not Require Strong Typing Limitations on Searching for Values

Summary Review Questions References Bibliography

4. Key-Value Database Terminology

Key-Value Database Data Modeling Terms

Key Value Namespace Partition Partition Key Schemaless

Key-Value Architecture Terms

Cluster Ring Replication

Key-Value Implementation Terms

Hash Function Collision Compression

Summary Review Questions References

5. Designing for Key-Value Databases

Key Design and Partitioning

Keys Should Follow a Naming Convention Well-Designed Keys Save Code Dealing with Ranges of Values Keys Must Take into Account Implementation Limitations How Keys Are Used in Partitioning

Designing Structured Values

Structured Data Types Help Reduce Latency Large Values Can Lead to Inefficient Read and Write Operations

Limitations of Key-Value Databases

Look Up Values by Key Only Key-Value Databases Do Not Support Range Queries No Standard Query Language Comparable to SQL for Relational Databases

Design Patterns for Key-Value Databases

Time to Live (TTL) Keys Emulating Tables Aggregates Atomic Aggregates Enumerable Keys Indexes

Summary Case Study: Key-Value Databases for Mobile Application Configuration Review Questions References

Part III: Document Databases

6. Introduction to Document Databases

What Is a Document?

Documents Are Not So Simple After All Documents and Key-Value Pairs Managing Multiple Documents in Collections

Avoid Explicit Schema Definitions Basic Operations on Document Databases

Inserting Documents into a Collection Deleting Documents from a Collection Updating Documents in a Collection Retrieving Documents from a Collection

Summary Review Questions References

7. Document Database Terminology

Document and Collection Terms

Document Collection Embedded Document Schemaless Polymorphic Schema

Types of Partitions

Vertical Partitioning Horizontal Partitioning or Sharding

Data Modeling and Query Processing

Normalization Denormalization Query Processor

Summary Review Questions References

8. Designing for Document Databases

Normalization, Denormalization, and the Search for Proper Balance

One-to-Many Relations Many-to-Many Relations The Need for Joins Executing Joins: The Heavy Lifting of Relational Databases What Would a Document Database Modeler Do?

Planning for Mutable Documents

Avoid Moving Oversized Documents

The Goldilocks Zone of Indexes

Read-Heavy Applications Write-Heavy Applications

Modeling Common Relations

One-to-Many Relations in Document Databases Many-to-Many Relations in Document Databases Modeling Hierarchies in Document Databases

Summary Case Study: Customer Manifests

Embed or Not Embed? Choosing Indexes Separate Collections by Type?

Review Questions References

Part IV: Column Family Databases

9. Introduction to Column Family Databases

In the Beginning, There Was Google BigTable

Utilizing Dynamic Control over Columns Indexing by Row, Column Name, and Time Stamp Controlling Location of Data Reading and Writing Atomic Rows Maintaining Rows in Sorted Order

Differences and Similarities to Key-Value and Document Databases

Column Family Database Features Column Family Database Similarities to and Differences from Document Databases Column Family Database Versus Relational Databases

Architectures Used in Column Family Databases

HBase Architecture: Variety of Nodes Cassandra Architecture: Peer-to-Peer Getting the Word Around: Gossip Protocol Thermodynamics and Distributed Database: Why We Need Anti-Entropy Hold This for Me: Hinted Handoff

When to Use Column Family Databases Summary Review Questions References

10. Column Family Database Terminology

Basic Components of Column Family Databases

Keyspace Row Key Column Column Families

Structures and Processes: Implementing Column Family Databases

Internal Structures and Configuration Parameters of Column Family Databases Old Friends: Clusters and Partitions Taking a Look Under the Hood: More Column Family Database Components

Processes and Protocols

Replication Anti-Entropy Gossip Protocol Hinted Handoff

Summary Review Questions References

11. Designing for Column Family Databases

Guidelines for Designing Tables

Denormalize Instead of Join Make Use of Valueless Columns Use Both Column Names and Column Values to Store Data Model an Entity with a Single Row Avoid Hotspotting in Row Keys Keep an Appropriate Number of Column Value Versions Avoid Complex Data Structures in Column Values

Guidelines for Indexing

When to Use Secondary Indexes Managed by the Column Family Database System When to Create and Manage Secondary Indexes Using Tables

Tools for Working with Big Data

Extracting, Transforming, and Loading Big Data Analyzing Big Data Tools for Monitoring Big Data

Summary Case Study: Customer Data Analysis

Understanding User Needs

Review Questions References

Part V: Graph Databases

12. Introduction to Graph Databases

What Is a Graph? Graphs and Network Modeling

Modeling Geographic Locations Modeling Infectious Diseases Modeling Abstract and Concrete Entities Modeling Social Media

Advantages of Graph Databases

Query Faster by Avoiding Joins Simplified Modeling Multiple Relations Between Entities

Summary Review Questions References

13. Graph Database Terminology

Elements of Graphs

Vertex Edge Path Loop

Operations on Graphs

Union of Graphs Intersection of Graphs Graph Traversal

Properties of Graphs and Nodes

Isomorphism Order and Size Degree Closeness Betweenness

Types of Graphs

Undirected and Directed Graphs Flow Network Bipartite Graph Multigraph Weighted Graph

Summary Review Questions References

14. Designing for Graph Databases

Getting Started with Graph Design

Designing a Social Network Graph Database Queries Drive Design (Again)

Querying a Graph

Cypher: Declarative Querying Gremlin: Query by Graph Traversal

Tips and Traps of Graph Database Design

Use Indexes to Improve Retrieval Time Use Appropriate Types of Edges Watch for Cycles When Traversing Graphs Consider the Scalability of Your Graph Database

Summary Case Study: Optimizing Transportation Routes

Understanding User Needs Designing a Graph Analysis Solution

Review Questions References

Part VI: Choosing a Database for Your Application

15. Guidelines for Selecting a Database

Choosing a NoSQL Database

Criteria for Selecting Key-Value Databases Use Cases and Criteria for Selecting Document Databases Use Cases and Criteria for Selecting Column Family Databases Use Cases and Criteria for Selecting Graph Databases

Using NoSQL and Relational Databases Together Summary Review Questions References

Part VII: Appendices

A. Answers to Chapter Review Questions

Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15

B. List of NoSQL Databases

Glossary Index Code Snippets

← Prev
Back
Next →

← Prev
Back
Next →