Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Half title
Reviews
Title page
Imprints page
Brief Contents
Contents
About the Authors
Preface
Who This Book is For
Topics Covered in this Book
How to Read this Book
Cross-Chapter Case Study: Sober
Additional Material
Acknowledgments
Part I Databases and Database Design
Contents
1 Fundamental Concepts of Database Management
1.1 Applications of Database Technology
1.2 Key Definitions
1.3 File versus Database Approach to Data Management
1.3.1 The File-Based Approach
1.3.2 The Database Approach
1.4 Elements of a Database System
1.4.1 Database Model versus Instances
1.4.2 Data Model
1.4.3 The Three-Layer Architecture
1.4.4 Catalog
1.4.5 Database Users
1.4.6 Database Languages
1.5 Advantages of Database Systems and Database Management
1.5.1 Data Independence
1.5.2 Database Modeling
1.5.3 Managing Structured, Semi-Structured, and Unstructured Data
1.5.4 Managing Data Redundancy
1.5.5 Specifying Integrity Rules
1.5.6 Concurrency Control
1.5.7 Backup and Recovery Facilities
1.5.8 Data Security
1.5.9 Performance Utilities
Summary
Problems and Exercises
2 Architecture and Categorization of DBMSs
2.1 Architecture of a DBMS
2.1.1 Connection and Security Manager
2.1.2 DDL Compiler
2.1.3 Query Processor
2.1.3.1 DML Compiler
2.1.3.2 Query Parser and Query Rewriter
2.1.3.3 Query Optimizer
2.1.3.4 Query Executor
2.1.4 Storage Manager
2.1.4.1 Transaction Manager
2.1.4.2 Buffer Manager
2.1.4.3 Lock Manager
2.1.4.4 Recovery Manager
2.1.5 DBMS Utilities
2.1.6 DBMS Interfaces
2.2 Categorization of DBMSs
2.2.1 Categorization Based on Data Model
2.2.1.1 Hierarchical DBMSs
2.2.1.2 Network DBMSs
2.2.1.3 Relational DBMSs
2.2.1.4 Object-Oriented DBMSs
2.2.1.5 Object-Relational/Extended Relational DBMSs
2.2.1.6 XML DBMSs
2.2.1.7 NoSQL DBMSs
2.2.2 Categorization Based on Degree of Simultaneous Access
2.2.3 Categorization Based on Architecture
2.2.4 Categorization Based on Usage
Summary
Problems and Exercises
3 Conceptual Data Modeling Using the (E)ER Model and UML Class Diagram
3.1 Phases of Database Design
3.2 The Entity Relationship Model
3.2.1 Entity Types
3.2.2 Attribute Types
3.2.3.1 Domains
3.2.3.2 Key Attribute Types
3.2.3.3 Simple versus Composite Attribute Types
3.2.3.4 Single-Valued versus Multi-Valued Attribute Types
3.2.3.5 Derived Attribute Type
3.2.4 Relationship Types
3.2.4.1 Degree and Roles
3.2.4.2 Cardinalities
3.2.4.3 Relationship Attribute Types
3.2.5 Weak Entity Types
3.2.6 Ternary Relationship Types
3.2.7 Examples of the ER Model
3.2.8 Limitations of the ER Model
3.3 The Enhanced Entity Relationship (EER) Model
3.3.1 Specialization/Generalization
3.3.2 Categorization
3.3.3 Aggregation
3.3.4 Examples of the EER Model
3.3.5 Designing an EER Model
3.4 The UML Class Diagram
3.4.1 Recap of Object Orientation
3.4.2 Classes
3.4.3 Variables
3.4.4 Access Modifiers
3.4.5 Associations
3.4.5.1 Association Class
3.4.5.2 Unidirectional versus Bidirectional Association
3.4.5.3 Qualified Association
3.4.6 Specialization/Generalization
3.4.7 Aggregation
3.4.8 UML Example
3.4.9 Advanced UML Modeling Concepts
3.4.9.1 Changeability Property
3.4.9.2 Object Constraint Language (OCL)
3.4.9.3 Dependency Relationship
3.4.10 UML versus EER
Summary
Problems and Exercises
4 Organizational Aspects of Data Management
4.1 Data Management
4.1.1 Catalogs and the Role of Metadata
4.1.2 Metadata Modeling
4.1.3 Data Quality
4.1.3.1 Data Quality Dimensions
Accuracy
Completeness
Consistency
Accessibility
4.1.3.2 Data Quality Problems
4.1.4 Data Governance
4.2 Roles in Data Management
4.2.1 Information Architect
4.2.2 Database Designer
4.2.3 Data Owner
4.2.4 Data Steward
4.2.5 Database Administrator
4.2.6 Data Scientist
Summary
Problems and Exercises
Part II Types of Database Systems
Contents
5 Legacy Databases
5.1 The Hierarchical Model
5.2 The CODASYL Model
Summary
Problems and Exercises
6 Relational Databases
6.1 The Relational Model
6.1.1 Basic Concepts
6.1.2 Formal Definitions
6.1.3 Types of Keys
6.1.3.1 Superkeys and Keys
6.1.3.2 Candidate Keys, Primary Keys, and Alternative Keys
6.1.3.3 Foreign Keys
6.1.4 Relational Constraints
6.1.5 Example Relational Data Model
6.2 Normalization
6.2.1 Insertion, Deletion, and Update Anomalies in an Unnormalized Relational Model
6.2.2 Informal Normalization Guidelines
6.2.3 Functional Dependencies and Prime Attribute Types
6.2.4 Normalization Forms
6.2.4.1 First Normal Form (1 NF)
6.2.4.2 Second Normal Form (2 NF)
6.2.4.3 Third Normal Form (3 NF)
6.2.4.4 Boyce–Codd Normal Form (BCNF)
6.2.4.5 Fourth Normal Form (4 NF)
6.3 Mapping a Conceptual ER Model to a Relational Model
6.3.1 Mapping Entity Types
6.3.2 Mapping Relationship Types
6.3.2.1 Mapping a Binary 1:1 Relationship type
6.3.2.2 Mapping a Binary 1:N Relationship Type
6.3.2.3 Mapping a Binary M:N Relationship Type
6.3.2.4 Mapping Unary Relationship Types
6.3.2.5 Mapping n-ary Relationship Types
6.3.3 Mapping Multi-Valued Attribute Types
6.3.4 Mapping Weak Entity Types
6.3.5 Putting it All Together
6.4 Mapping a Conceptual EER Model to a Relational Model
6.4.1 Mapping an EER Specialization
6.4.2 Mapping an EER Categorization
6.4.3 Mapping an EER Aggregation
Summary
Problems and Exercises
7 Relational Databases
7.1 Relational Database Management Systems and SQL
7.1.1 Key Characteristics of SQL
7.1.2 Three-Layer Database Architecture
7.2 SQL Data Definition Language
7.2.1 Key DDL Concepts
7.2.2 DDL Example
7.2.3 Referential Integrity Constraints
7.2.4 DROP and ALTER Command
7.3 SQL Data Manipulation Language
7.3.1 SQL SELECT Statement
7.3.1.1 Simple Queries
7.3.1.2 Queries with Aggregate Functions
7.3.1.3 Queries with GROUP BY/HAVING
7.3.1.4 Queries with ORDER BY
7.3.1.5 Join Queries
Inner Joins
Outer Joins
7.3.1.6 Nested Queries
7.3.1.7 Correlated Queries
7.3.1.8 Queries with ALL/ANY
7.3.1.9 Queries with EXISTS
7.3.1.10 Queries with Subqueries in SELECT/FROM
7.3.1.11 Queries with Set Operations
7.3.2 SQL INSERT Statement
7.3.3 SQL DELETE Statement
7.3.4 SQL UPDATE Statement
7.4 SQL Views
7.5 SQL Indexes
7.6 SQL Privileges
7.7 SQL for Metadata Management
Summary
Problems and Exercises
8 Object-Oriented Databases and Object Persistence
8.1 Recap: Basic Concepts of OO
8.2 Advanced Concepts of OO
8.2.1 Method Overloading
8.2.2 Inheritance
8.2.3 Method Overriding
8.2.4 Polymorphism and Dynamic Binding
8.3 Basic Principles of Object Persistence
8.3.1 Serialization
8.4 OODBMS
8.4.1 Object Identifiers
8.4.2 ODMG Standard
8.4.3 Object Model
8.4.4 Object Definition Language (ODL)
8.4.5 Object Query Language (OQL)
8.4.5.1 Simple OQL Queries
8.4.5.2 SELECT FROM WHERE OQL Queries
8.4.5.3 Join OQL Queries
8.4.5.4 Other OQL Queries
8.4.6 Language Bindings
8.5 Evaluating OODBMSs
Summary
Problems and Exercises
9 Extended Relational Databases
9.1 Limitations of the Relational Model
9.2 Active RDBMS Extensions
9.2.1 Triggers
9.2.2 Stored Procedures
9.3 Object-Relational RDBMS Extensions
9.3.1 User-Defined Types
9.3.1.1 Distinct Data Types
9.3.1.2 Opaque Data Types
9.3.1.3 Unnamed Row Types
9.3.1.4 Named Row Types
9.3.1.5 Table Data Types
9.3.2 User-Defined Functions
9.3.3 Inheritance
9.3.3.1 Inheritance at Data Type Level
9.3.3.2 Inheritance at Table Type Level
9.3.4 Behavior
9.3.5 Polymorphism
9.3.6 Collection Types
9.3.7 Large Objects
9.4 Recursive SQL Queries
Summary
Problems and Exercises
10 XML Databases
10.1 Extensible Markup Language
10.1.1 Basic Concepts
10.1.2 Document Type Definition and XML Schema Definition
10.1.3 Extensible Stylesheet Language
10.1.4 Namespaces
10.1.5 XPath
10.2 Processing XML Documents
10.3 Storage of XML Documents
10.3.1 The Document-Oriented Approach for Storing XML Documents
10.3.2 The Data-Oriented Approach for Storing XML Documents
10.3.3 The Combined Approach for Storing XML Documents
10.4 Differences Between XML Data and Relational Data
10.5 Mappings Between XML Documents and (Object-) Relational Data
10.5.1 Table-Based Mapping
10.5.2 Schema-Oblivious Mapping
10.5.3 Schema-Aware Mapping
10.5.4 SQL/XML
10.6 Searching XML Data
10.6.1 Full-Text Search
10.6.2 Keyword-Based Search
10.6.3 Structured Search With XQuery
10.6.4 Semantic Search With RDF and SPARQL
10.7 XML for Information Exchange
10.7.1 Message-Oriented Middleware
10.7.2 SOAP-Based Web Services
10.7.3 REST-Based Web Services
10.7.4 Web Services and Databases
10.8 Other Data Representation Formats
Summary
Problems and Exercises
11 NoSQL Databases
11.1 The NoSQL Movement
11.1.1 The End of the “One Size Fits All” Era?
11.1.2 The Emergence of the NoSQL Movement
11.2 Key–Value Stores
11.2.1 From Keys to Hashes
11.2.2 Horizontal Scaling
11.2.3 An Example: Memcached
11.2.4 Request Coordination
11.2.5 Consistent Hashing
11.2.6 Replication and Redundancy
11.2.7 Eventual Consistency
11.2.8 Stabilization
11.2.9 Integrity Constraints and Querying
11.3 Tuple and Document Stores
11.3.1 Items with Keys
11.3.2 Filters and Queries
11.3.3 Complex Queries and Aggregation with MapReduce
11.3.4 SQL After All…
11.4 Column-Oriented Databases
11.5 Graph-Based Databases
11.5.1 Cypher Overview
11.5.2 Exploring a Social Graph
11.6 Other NoSQL Categories
Summary
Problems and Exercises
Part III Physical Data Storage, Transaction Management, and Database Access
Contents
12 Physical File Organization and Indexing
12.1 Storage Hardware and Physical Database Design
12.1.1 The Storage Hierarchy
12.1.2 Internals of Hard Disk Drives
12.1.3 From Logical Concepts to Physical Constructs
12.2 Record Organization
12.3 File Organization
12.3.1 Introductory Concepts: Search Keys, Primary, and Secondary File Organization
12.3.2 Heap File Organization
12.3.3 Sequential File Organization
12.3.4 Random File Organization (Hashing)
12.3.4.1 Key-to-Address Transformation
12.3.4.2 Factors that Determine the Efficiency of Random File Organization
12.3.5 Indexed Sequential File Organization
12.3.5.1 Basic Terminology of Indexes
12.3.5.2 Primary Indexes
12.3.5.3 Clustered Indexes
12.3.5.4 Multilevel Indexes
12.3.6 List Data Organization (Linear and Nonlinear Lists)
12.3.6.1 Linear Lists
12.3.6.2 Tree Data Structures
12.3.7 Secondary Indexes and Inverted Files
12.3.7.1 Characteristics of Secondary Indexes
12.3.7.2 Inverted Files
12.3.7.3 Multicolumn Indexes
12.3.7.4 Other Index Types
12.3.8 B-Trees and B+-Trees
12.3.8.1 Multilevel Indexes Revisited
12.3.8.2 Binary Search Trees
12.3.8.3 B-Trees
12.3.8.4 B+-Trees
Summary
Problems and Exercises
13 Physical Database Organization
13.1 Physical Database Organization and Database Access Methods
13.1.1 From Database to Tablespace
13.1.2 Index Design
13.1.3 Database Access Methods
13.1.3.1 Functioning of the Query Optimizer
13.1.3.2 Index Search (with Atomic Search Key)
13.1.3.3 Multiple Index and Multicolumn Index Search
13.1.3.4 Index-Only Access
13.1.3.5 Full Table Scan
13.1.4 Join Implementations
13.1.4.1 Nested-Loop Join
13.1.4.2 Sort-Merge Join
13.1.4.3 Hash Join
13.2 Enterprise Storage Subsystems and Business Continuity
13.2.1 Disk Arrays and RAID
13.2.2 Enterprise Storage Subsystems
13.2.2.1 Overview and Classification
13.2.2.2 DAS (Directly Attached Storage)
13.2.2.3 SAN (Storage Area Network)
13.2.2.4 NAS (Network Attached Storage)
13.2.2.5 NAS Gateway
13.2.2.6 iSCSI/Storage Over IP
13.2.3 Business Continuity
13.2.3.1 Contingency Planning, Recovery Point, and Recovery Time
13.2.3.2 Availability and Accessibility of Storage Devices
13.2.3.3 Availability of Database Functionality
13.2.3.4 Data Availability
Summary
Problems and Exercises
14 Basics of Transaction Management
14.1 Transactions, Recovery, and Concurrency Control
14.2 Transactions and Transaction Management
14.2.1 Delineating Transactions and the Transaction Lifecycle
14.2.2 DBMS Components Involved in Transaction Management
14.2.3 The Logfile
14.3 Recovery
14.3.1 Types of Failures
14.3.2 System Recovery
14.3.3 Media Recovery
14.4 Concurrency Control
14.4.1 Typical Concurrency Problems
14.4.1.1 Lost Update Problem
14.4.1.2 Uncommitted Dependency Problem (aka Dirty Read Problem)
14.4.1.3 Inconsistent Analysis Problem
14.4.1.4 Other Concurrency-Related Problems
14.4.2 Schedules and Serial Schedules
14.4.3 Serializable Schedules
14.4.4 Optimistic and Pessimistic Schedulers
14.4.5 Locking and Locking Protocols
14.4.5.1 Purposes of Locking
14.4.5.2 The Two-Phase Locking Protocol (2PL)
14.4.5.3 Cascading Rollbacks
14.4.5.4 Dealing with Deadlocks
14.4.5.5 Isolation Levels
14.4.5.6 Lock Granularity
14.5 The ACID Properties of Transactions
Summary
Problems and Exercises
15 Accessing Databases and Database APIs
15.1 Database System Architectures
15.1.1 Centralized System Architectures
15.1.2 Tiered System Architectures
15.2 Classification of Database APIs
15.2.1 Proprietary versus Universal APIs
15.2.2 Embedded versus Call-Level APIs
15.2.3 Early Binding versus Late Binding
15.3 Universal Database APIs
15.3.1 ODBC
15.3.2 OLE DB and ADO
15.3.3 ADO.NET
15.3.4 Java DataBase Connectivity (JDBC)
15.3.5 Intermezzo: SQL Injection and Access Security
15.3.6 SQLJ
15.3.7 Intermezzo: Embedded APIs versus Embedded DBMSs
15.3.8 Language-Integrated Querying
15.4 Object Persistence and Object-Relational Mapping APIs
15.4.1 Object Persistence with Enterprise JavaBeans
15.4.2 Object Persistence with the Java Persistence API
15.4.3 Object Persistence with Java Data Objects
15.4.4 Object Persistence in Other Host Languages
15.5 Database API Summary
15.6 Database Access in the World Wide Web
15.6.1 Introduction: the Original Web Server
15.6.2 The Common Gateway Interface: Toward Dynamic Web Pages
15.6.3 Client-Side Scripting: The Desire for a Richer Web
15.6.4 JavaScript as a Platform
15.6.5 DBMSs Adapt: REST, Other Web Services, and a Look Ahead
Summary
Problems and Exercises
16 Data Distribution and Distributed Transaction Management
16.1 Distributed Systems and Distributed Databases
16.2 Architectural Implications of Distributed Databases
16.3 Fragmentation, Allocation, and Replication
16.3.1 Vertical Fragmentation
16.3.2 Horizontal Fragmentation (Sharding)
16.3.3 Mixed Fragmentation
16.3.4 Replication
16.3.5 Distribution and Replication of Metadata
16.4 Transparency
16.5 Distributed Query Processing
16.6 Distributed Transaction Management and Concurrency Control
16.6.1 Primary Site and Primary Copy 2PL
16.6.2 Distributed 2PL
16.6.3 The Two-Phase Commit Protocol (2PC)
16.6.4 Optimistic Concurrency and Loosely Coupled Systems
16.6.5 Compensation-Based Transaction Models
16.7 Eventual Consistency and BASE Transactions
16.7.1 Horizontal Fragmentation and Consistent Hashing
16.7.2 The CAP Theorem
16.7.3 BASE Transactions
16.7.4 Multi-Version Concurrency Control and Vector Clocks
16.7.5 Quorum-Based Consistency
Summary
Problems and Exercises
Part IV Data Warehousing, Data Governance, and (Big) Data Analytics
Contents
17 Data Warehousing and Business Intelligence
17.1 Operational versus Tactical/Strategic Decision-Making
17.2 Data Warehouse Definition
17.3 Data Warehouse Schemas
17.3.1 Star Schema
17.3.2 Snowflake Schema
17.3.3 Fact Constellation
17.3.4 Specific Schema Issues
17.3.4.1 Surrogate Keys
17.3.4.2 Granularity of the Fact Table
17.3.4.3 Factless Fact Tables
17.3.4.4 Optimizing the Dimension Tables
17.3.4.5 Defining Junk Dimensions
17.3.4.6 Defining Outrigger Tables
17.3.4.7 Slowly Changing Dimensions
17.3.4.8 Rapidly Changing Dimensions
17.4 The Extraction, Transformation, and Loading (ETL) Process
17.5 Data Marts
17.6 Virtual Data Warehouses and Virtual Data Marts
17.7 Operational Data Store
17.8 Data Warehouses versus Data Lakes
17.9 Business Intelligence
17.9.1 Query and Reporting
17.9.2 Pivot Tables
17.9.3 On-Line Analytical Processing (OLAP)
17.9.3.1 MOLAP
17.9.3.2 ROLAP
17.9.3.3 HOLAP
17.9.3.4 OLAP Operators
17.9.3.5 OLAP Queries in SQL
Summary
Problems and Exercises
18 Data Integration, Data Quality, and Data Governance
18.1 Data and Process Integration
18.1.1 Convergence of Analytical and Operational Data Needs
18.1.2 Data Integration and Data Integration Patterns
18.1.2.1 Data Consolidation: Extract, Transform, Load (ETL)
18.1.2.2 Data Federation: Enterprise Information Integration (EII)
18.1.2.3 Data Propagation: Enterprise Application Integration (EAI)
18.1.2.4 Data Propagation: Enterprise Data Replication (EDR)
18.1.2.5 Changed Data Capture (CDC), Near-Real-Time ETL, and Event Processing
18.1.2.6 Data Virtualization
18.1.2.7 Data as a Service and Data in the Cloud
18.1.3 Data Services and Data Flows in the Context of Data and Process Integration
18.1.3.1 Business Process Integration
18.1.3.2 Patterns for Managing Sequence Dependencies and Data Dependencies in Processes
18.1.3.3 A Unified View on Data and Process Integration
18.2 Searching Unstructured Data and Enterprise Search
18.2.1 Principles of Full-Text Search
18.2.2 Indexing Full-Text Documents
18.2.3 Web Search Engines
18.2.4 Enterprise Search
18.3 Data Quality and Master Data Management
18.4 Data Governance
18.4.1 Total Data Quality Management (TDQM)
18.4.2 Capability Maturity Model Integration (CMMI)
18.4.3 Data Management Body of Knowledge (DMBOK)
18.4.4 Control Objectives for Information and Related Technology (COBIT)
18.4.5 Information Technology Infrastructure Library
18.5 Outlook
18.6 Conclusion
Problems and Exercises
19 Big Data
19.1 The 5 Vs of Big Data
19.2 Hadoop
19.2.1 History of Hadoop
19.2.2 The Hadoop Stack
19.2.2.1 The Hadoop Distributed File System
19.2.2.2 MapReduce
19.2.2.3 Yet Another Resource Negotiator
19.3 SQL on Hadoop
19.3.1 HBase: The First Database on Hadoop
19.3.2 Pig
19.3.3 Hive
19.4 Apache Spark
19.4.1 Spark Core
19.4.2 Spark SQL
19.4.3 MLlib, Spark Streaming, and GraphX
19.5 Conclusion
Problems and Exercises
20 Analytics
20.1 The Analytics Process Model
20.2 Example Analytics Applications
20.3 Data Scientist Job Profile
20.4 Data Pre-Processing
20.4.1 Denormalizing Data for Analysis
20.4.2 Sampling
20.4.3 Exploratory Analysis
20.4.4 Missing Values
20.4.5 Outlier Detection and Handling
20.5 Types of Analytics
20.5.1 Predictive Analytics
20.5.1.1 Linear Regression
20.5.1.2 Logistic Regression
Logistic Regression Properties
20.5.1.3 Decision Trees
Splitting Decision
Stopping Decision
Decision Tree Properties
Regression Trees
20.5.1.4 Other Predictive Analytics Techniques
20.5.2 Evaluating Predictive Models
20.5.2.1 Splitting Up the Dataset
20.5.2.2 Performance Measures for Classification Models
20.5.2.3 Performance Measures for Regression Models
20.5.2.4 Other Performance Measures for Predictive Analytical Models
20.5.3 Descriptive Analytics
20.5.3.1 Association Rules
Basic Setting
Support, Confidence, and Lift
Post-Processing Association Rules
20.5.3.2 Sequence Rules
20.5.3.3 Clustering
Hierarchical Clustering
K-means Clustering
20.5.4 Social Network Analytics
20.5.4.1 Social Network Definitions
20.5.4.2 Social Network Metrics
20.5.4.3 Social Network Learning
20.6 Post-Processing of Analytical Models
20.7 Critical Success Factors for Analytical Models
20.8 Economic Perspective on Analytics
20.8.1 Total Cost of Ownership (TCO)
20.8.2 Return on Investment
20.8.3 In- versus Outsourcing
20.8.4 On-Premises versus Cloud Solutions
20.8.5 Open-Source versus Commercial Software
20.9 Improving the ROI of Analytics
20.9.1 New Sources of Data
20.9.2 Data Quality
20.9.3 Management Support
20.9.4 Organizational Aspects
20.9.5 Cross-Fertilization
20.10 Privacy and Security
20.10.1 Overall Considerations Regarding Privacy and Security
20.10.2 The RACI Matrix
20.10.3 Accessing Internal Data
20.10.3.1 Anonymization
20.10.3.2 SQL Views
20.10.3.3 Label-Based Access Control
20.10.4 Privacy Regulation
20.11 Conclusion
Problems and Exercises
Appendix Using the Online Environment
How to Access the Online Environment
Environment: Relational Databases and SQL
Environment: MongoDB
Environment: Neo4j and Cypher
Environment: Tree Structure Visualizations
Environment: HBase
Glossary
Index
Endorsements
← Prev
Back
Next →
← Prev
Back
Next →