Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Half title Reviews Title page Imprints page Brief Contents Contents About the Authors Preface
Who This Book is For Topics Covered in this Book How to Read this Book Cross-Chapter Case Study: Sober Additional Material Acknowledgments
Part I Databases and Database Design
Contents 1 Fundamental Concepts of Database Management
1.1 Applications of Database Technology 1.2 Key Definitions 1.3 File versus Database Approach to Data Management
1.3.1 The File-Based Approach 1.3.2 The Database Approach
1.4 Elements of a Database System
1.4.1 Database Model versus Instances 1.4.2 Data Model 1.4.3 The Three-Layer Architecture 1.4.4 Catalog 1.4.5 Database Users 1.4.6 Database Languages
1.5 Advantages of Database Systems and Database Management
1.5.1 Data Independence 1.5.2 Database Modeling 1.5.3 Managing Structured, Semi-Structured, and Unstructured Data 1.5.4 Managing Data Redundancy 1.5.5 Specifying Integrity Rules 1.5.6 Concurrency Control 1.5.7 Backup and Recovery Facilities 1.5.8 Data Security 1.5.9 Performance Utilities
Summary Problems and Exercises
2 Architecture and Categorization of DBMSs
2.1 Architecture of a DBMS
2.1.1 Connection and Security Manager 2.1.2 DDL Compiler 2.1.3 Query Processor
2.1.3.1 DML Compiler 2.1.3.2 Query Parser and Query Rewriter 2.1.3.3 Query Optimizer 2.1.3.4 Query Executor
2.1.4 Storage Manager
2.1.4.1 Transaction Manager 2.1.4.2 Buffer Manager 2.1.4.3 Lock Manager 2.1.4.4 Recovery Manager
2.1.5 DBMS Utilities 2.1.6 DBMS Interfaces
2.2 Categorization of DBMSs
2.2.1 Categorization Based on Data Model
2.2.1.1 Hierarchical DBMSs 2.2.1.2 Network DBMSs 2.2.1.3 Relational DBMSs 2.2.1.4 Object-Oriented DBMSs 2.2.1.5 Object-Relational/Extended Relational DBMSs 2.2.1.6 XML DBMSs 2.2.1.7 NoSQL DBMSs
2.2.2 Categorization Based on Degree of Simultaneous Access 2.2.3 Categorization Based on Architecture 2.2.4 Categorization Based on Usage
Summary Problems and Exercises
3 Conceptual Data Modeling Using the (E)ER Model and UML Class Diagram
3.1 Phases of Database Design 3.2 The Entity Relationship Model
3.2.1 Entity Types 3.2.2 Attribute Types
3.2.3.1 Domains 3.2.3.2 Key Attribute Types 3.2.3.3 Simple versus Composite Attribute Types 3.2.3.4 Single-Valued versus Multi-Valued Attribute Types 3.2.3.5 Derived Attribute Type
3.2.4 Relationship Types
3.2.4.1 Degree and Roles 3.2.4.2 Cardinalities 3.2.4.3 Relationship Attribute Types
3.2.5 Weak Entity Types 3.2.6 Ternary Relationship Types 3.2.7 Examples of the ER Model 3.2.8 Limitations of the ER Model
3.3 The Enhanced Entity Relationship (EER) Model
3.3.1 Specialization/Generalization 3.3.2 Categorization 3.3.3 Aggregation 3.3.4 Examples of the EER Model 3.3.5 Designing an EER Model
3.4 The UML Class Diagram
3.4.1 Recap of Object Orientation 3.4.2 Classes 3.4.3 Variables 3.4.4 Access Modifiers 3.4.5 Associations
3.4.5.1 Association Class 3.4.5.2 Unidirectional versus Bidirectional Association 3.4.5.3 Qualified Association
3.4.6 Specialization/Generalization 3.4.7 Aggregation 3.4.8 UML Example 3.4.9 Advanced UML Modeling Concepts
3.4.9.1 Changeability Property 3.4.9.2 Object Constraint Language (OCL) 3.4.9.3 Dependency Relationship
3.4.10 UML versus EER
Summary Problems and Exercises
4 Organizational Aspects of Data Management
4.1 Data Management
4.1.1 Catalogs and the Role of Metadata 4.1.2 Metadata Modeling 4.1.3 Data Quality
4.1.3.1 Data Quality Dimensions
Accuracy Completeness Consistency Accessibility
4.1.3.2 Data Quality Problems
4.1.4 Data Governance
4.2 Roles in Data Management
4.2.1 Information Architect 4.2.2 Database Designer 4.2.3 Data Owner 4.2.4 Data Steward 4.2.5 Database Administrator 4.2.6 Data Scientist
Summary Problems and Exercises
Part II Types of Database Systems
Contents 5 Legacy Databases
5.1 The Hierarchical Model 5.2 The CODASYL Model Summary Problems and Exercises
6 Relational Databases
6.1 The Relational Model
6.1.1 Basic Concepts 6.1.2 Formal Definitions 6.1.3 Types of Keys
6.1.3.1 Superkeys and Keys 6.1.3.2 Candidate Keys, Primary Keys, and Alternative Keys 6.1.3.3 Foreign Keys
6.1.4 Relational Constraints 6.1.5 Example Relational Data Model
6.2 Normalization
6.2.1 Insertion, Deletion, and Update Anomalies in an Unnormalized Relational Model 6.2.2 Informal Normalization Guidelines 6.2.3 Functional Dependencies and Prime Attribute Types 6.2.4 Normalization Forms
6.2.4.1 First Normal Form (1 NF) 6.2.4.2 Second Normal Form (2 NF) 6.2.4.3 Third Normal Form (3 NF) 6.2.4.4 Boyce–Codd Normal Form (BCNF) 6.2.4.5 Fourth Normal Form (4 NF)
6.3 Mapping a Conceptual ER Model to a Relational Model
6.3.1 Mapping Entity Types 6.3.2 Mapping Relationship Types
6.3.2.1 Mapping a Binary 1:1 Relationship type 6.3.2.2 Mapping a Binary 1:N Relationship Type 6.3.2.3 Mapping a Binary M:N Relationship Type 6.3.2.4 Mapping Unary Relationship Types 6.3.2.5 Mapping n-ary Relationship Types
6.3.3 Mapping Multi-Valued Attribute Types 6.3.4 Mapping Weak Entity Types 6.3.5 Putting it All Together
6.4 Mapping a Conceptual EER Model to a Relational Model
6.4.1 Mapping an EER Specialization 6.4.2 Mapping an EER Categorization 6.4.3 Mapping an EER Aggregation
Summary Problems and Exercises
7 Relational Databases
7.1 Relational Database Management Systems and SQL
7.1.1 Key Characteristics of SQL 7.1.2 Three-Layer Database Architecture
7.2 SQL Data Definition Language
7.2.1 Key DDL Concepts 7.2.2 DDL Example 7.2.3 Referential Integrity Constraints 7.2.4 DROP and ALTER Command
7.3 SQL Data Manipulation Language
7.3.1 SQL SELECT Statement
7.3.1.1 Simple Queries 7.3.1.2 Queries with Aggregate Functions 7.3.1.3 Queries with GROUP BY/HAVING 7.3.1.4 Queries with ORDER BY 7.3.1.5 Join Queries
Inner Joins Outer Joins
7.3.1.6 Nested Queries 7.3.1.7 Correlated Queries 7.3.1.8 Queries with ALL/ANY 7.3.1.9 Queries with EXISTS 7.3.1.10 Queries with Subqueries in SELECT/FROM 7.3.1.11 Queries with Set Operations
7.3.2 SQL INSERT Statement 7.3.3 SQL DELETE Statement 7.3.4 SQL UPDATE Statement
7.4 SQL Views 7.5 SQL Indexes 7.6 SQL Privileges 7.7 SQL for Metadata Management Summary Problems and Exercises
8 Object-Oriented Databases and Object Persistence
8.1 Recap: Basic Concepts of OO 8.2 Advanced Concepts of OO
8.2.1 Method Overloading 8.2.2 Inheritance 8.2.3 Method Overriding 8.2.4 Polymorphism and Dynamic Binding
8.3 Basic Principles of Object Persistence
8.3.1 Serialization
8.4 OODBMS
8.4.1 Object Identifiers 8.4.2 ODMG Standard 8.4.3 Object Model 8.4.4 Object Definition Language (ODL) 8.4.5 Object Query Language (OQL)
8.4.5.1 Simple OQL Queries 8.4.5.2 SELECT FROM WHERE OQL Queries 8.4.5.3 Join OQL Queries 8.4.5.4 Other OQL Queries
8.4.6 Language Bindings
8.5 Evaluating OODBMSs Summary Problems and Exercises
9 Extended Relational Databases
9.1 Limitations of the Relational Model 9.2 Active RDBMS Extensions
9.2.1 Triggers 9.2.2 Stored Procedures
9.3 Object-Relational RDBMS Extensions
9.3.1 User-Defined Types
9.3.1.1 Distinct Data Types 9.3.1.2 Opaque Data Types 9.3.1.3 Unnamed Row Types 9.3.1.4 Named Row Types 9.3.1.5 Table Data Types
9.3.2 User-Defined Functions 9.3.3 Inheritance
9.3.3.1 Inheritance at Data Type Level 9.3.3.2 Inheritance at Table Type Level
9.3.4 Behavior 9.3.5 Polymorphism 9.3.6 Collection Types 9.3.7 Large Objects
9.4 Recursive SQL Queries Summary Problems and Exercises
10 XML Databases
10.1 Extensible Markup Language
10.1.1 Basic Concepts 10.1.2 Document Type Definition and XML Schema Definition 10.1.3 Extensible Stylesheet Language 10.1.4 Namespaces 10.1.5 XPath
10.2 Processing XML Documents 10.3 Storage of XML Documents
10.3.1 The Document-Oriented Approach for Storing XML Documents 10.3.2 The Data-Oriented Approach for Storing XML Documents 10.3.3 The Combined Approach for Storing XML Documents
10.4 Differences Between XML Data and Relational Data 10.5 Mappings Between XML Documents and (Object-) Relational Data
10.5.1 Table-Based Mapping 10.5.2 Schema-Oblivious Mapping 10.5.3 Schema-Aware Mapping 10.5.4 SQL/XML
10.6 Searching XML Data
10.6.1 Full-Text Search 10.6.2 Keyword-Based Search 10.6.3 Structured Search With XQuery 10.6.4 Semantic Search With RDF and SPARQL
10.7 XML for Information Exchange
10.7.1 Message-Oriented Middleware 10.7.2 SOAP-Based Web Services 10.7.3 REST-Based Web Services 10.7.4 Web Services and Databases
10.8 Other Data Representation Formats Summary Problems and Exercises
11 NoSQL Databases
11.1 The NoSQL Movement
11.1.1 The End of the “One Size Fits All” Era? 11.1.2 The Emergence of the NoSQL Movement
11.2 Key–Value Stores
11.2.1 From Keys to Hashes 11.2.2 Horizontal Scaling 11.2.3 An Example: Memcached 11.2.4 Request Coordination 11.2.5 Consistent Hashing 11.2.6 Replication and Redundancy 11.2.7 Eventual Consistency 11.2.8 Stabilization 11.2.9 Integrity Constraints and Querying
11.3 Tuple and Document Stores
11.3.1 Items with Keys 11.3.2 Filters and Queries 11.3.3 Complex Queries and Aggregation with MapReduce 11.3.4 SQL After All…
11.4 Column-Oriented Databases 11.5 Graph-Based Databases
11.5.1 Cypher Overview 11.5.2 Exploring a Social Graph
11.6 Other NoSQL Categories Summary Problems and Exercises
Part III Physical Data Storage, Transaction Management, and Database Access
Contents 12 Physical File Organization and Indexing
12.1 Storage Hardware and Physical Database Design
12.1.1 The Storage Hierarchy 12.1.2 Internals of Hard Disk Drives 12.1.3 From Logical Concepts to Physical Constructs
12.2 Record Organization 12.3 File Organization
12.3.1 Introductory Concepts: Search Keys, Primary, and Secondary File Organization 12.3.2 Heap File Organization 12.3.3 Sequential File Organization 12.3.4 Random File Organization (Hashing)
12.3.4.1 Key-to-Address Transformation 12.3.4.2 Factors that Determine the Efficiency of Random File Organization
12.3.5 Indexed Sequential File Organization
12.3.5.1 Basic Terminology of Indexes 12.3.5.2 Primary Indexes 12.3.5.3 Clustered Indexes 12.3.5.4 Multilevel Indexes
12.3.6 List Data Organization (Linear and Nonlinear Lists)
12.3.6.1 Linear Lists 12.3.6.2 Tree Data Structures
12.3.7 Secondary Indexes and Inverted Files
12.3.7.1 Characteristics of Secondary Indexes 12.3.7.2 Inverted Files 12.3.7.3 Multicolumn Indexes 12.3.7.4 Other Index Types
12.3.8 B-Trees and B+-Trees
12.3.8.1 Multilevel Indexes Revisited 12.3.8.2 Binary Search Trees 12.3.8.3 B-Trees 12.3.8.4 B+-Trees
Summary Problems and Exercises
13 Physical Database Organization
13.1 Physical Database Organization and Database Access Methods
13.1.1 From Database to Tablespace 13.1.2 Index Design 13.1.3 Database Access Methods
13.1.3.1 Functioning of the Query Optimizer 13.1.3.2 Index Search (with Atomic Search Key) 13.1.3.3 Multiple Index and Multicolumn Index Search 13.1.3.4 Index-Only Access 13.1.3.5 Full Table Scan
13.1.4 Join Implementations
13.1.4.1 Nested-Loop Join 13.1.4.2 Sort-Merge Join 13.1.4.3 Hash Join
13.2 Enterprise Storage Subsystems and Business Continuity
13.2.1 Disk Arrays and RAID 13.2.2 Enterprise Storage Subsystems
13.2.2.1 Overview and Classification 13.2.2.2 DAS (Directly Attached Storage) 13.2.2.3 SAN (Storage Area Network) 13.2.2.4 NAS (Network Attached Storage) 13.2.2.5 NAS Gateway 13.2.2.6 iSCSI/Storage Over IP
13.2.3 Business Continuity
13.2.3.1 Contingency Planning, Recovery Point, and Recovery Time 13.2.3.2 Availability and Accessibility of Storage Devices 13.2.3.3 Availability of Database Functionality 13.2.3.4 Data Availability
Summary Problems and Exercises
14 Basics of Transaction Management
14.1 Transactions, Recovery, and Concurrency Control 14.2 Transactions and Transaction Management
14.2.1 Delineating Transactions and the Transaction Lifecycle 14.2.2 DBMS Components Involved in Transaction Management 14.2.3 The Logfile
14.3 Recovery
14.3.1 Types of Failures 14.3.2 System Recovery 14.3.3 Media Recovery
14.4 Concurrency Control
14.4.1 Typical Concurrency Problems
14.4.1.1 Lost Update Problem 14.4.1.2 Uncommitted Dependency Problem (aka Dirty Read Problem) 14.4.1.3 Inconsistent Analysis Problem 14.4.1.4 Other Concurrency-Related Problems
14.4.2 Schedules and Serial Schedules 14.4.3 Serializable Schedules 14.4.4 Optimistic and Pessimistic Schedulers 14.4.5 Locking and Locking Protocols
14.4.5.1 Purposes of Locking 14.4.5.2 The Two-Phase Locking Protocol (2PL) 14.4.5.3 Cascading Rollbacks 14.4.5.4 Dealing with Deadlocks 14.4.5.5 Isolation Levels 14.4.5.6 Lock Granularity
14.5 The ACID Properties of Transactions Summary Problems and Exercises
15 Accessing Databases and Database APIs
15.1 Database System Architectures
15.1.1 Centralized System Architectures 15.1.2 Tiered System Architectures
15.2 Classification of Database APIs
15.2.1 Proprietary versus Universal APIs 15.2.2 Embedded versus Call-Level APIs 15.2.3 Early Binding versus Late Binding
15.3 Universal Database APIs
15.3.1 ODBC 15.3.2 OLE DB and ADO 15.3.3 ADO.NET 15.3.4 Java DataBase Connectivity (JDBC) 15.3.5 Intermezzo: SQL Injection and Access Security 15.3.6 SQLJ 15.3.7 Intermezzo: Embedded APIs versus Embedded DBMSs 15.3.8 Language-Integrated Querying
15.4 Object Persistence and Object-Relational Mapping APIs
15.4.1 Object Persistence with Enterprise JavaBeans 15.4.2 Object Persistence with the Java Persistence API 15.4.3 Object Persistence with Java Data Objects 15.4.4 Object Persistence in Other Host Languages
15.5 Database API Summary 15.6 Database Access in the World Wide Web
15.6.1 Introduction: the Original Web Server 15.6.2 The Common Gateway Interface: Toward Dynamic Web Pages 15.6.3 Client-Side Scripting: The Desire for a Richer Web 15.6.4 JavaScript as a Platform 15.6.5 DBMSs Adapt: REST, Other Web Services, and a Look Ahead
Summary Problems and Exercises
16 Data Distribution and Distributed Transaction Management
16.1 Distributed Systems and Distributed Databases 16.2 Architectural Implications of Distributed Databases 16.3 Fragmentation, Allocation, and Replication
16.3.1 Vertical Fragmentation 16.3.2 Horizontal Fragmentation (Sharding) 16.3.3 Mixed Fragmentation 16.3.4 Replication 16.3.5 Distribution and Replication of Metadata
16.4 Transparency 16.5 Distributed Query Processing 16.6 Distributed Transaction Management and Concurrency Control
16.6.1 Primary Site and Primary Copy 2PL 16.6.2 Distributed 2PL 16.6.3 The Two-Phase Commit Protocol (2PC) 16.6.4 Optimistic Concurrency and Loosely Coupled Systems 16.6.5 Compensation-Based Transaction Models
16.7 Eventual Consistency and BASE Transactions
16.7.1 Horizontal Fragmentation and Consistent Hashing 16.7.2 The CAP Theorem 16.7.3 BASE Transactions 16.7.4 Multi-Version Concurrency Control and Vector Clocks 16.7.5 Quorum-Based Consistency
Summary Problems and Exercises
Part IV Data Warehousing, Data Governance, and (Big) Data Analytics
Contents 17 Data Warehousing and Business Intelligence
17.1 Operational versus Tactical/Strategic Decision-Making 17.2 Data Warehouse Definition 17.3 Data Warehouse Schemas
17.3.1 Star Schema 17.3.2 Snowflake Schema 17.3.3 Fact Constellation 17.3.4 Specific Schema Issues
17.3.4.1 Surrogate Keys 17.3.4.2 Granularity of the Fact Table 17.3.4.3 Factless Fact Tables 17.3.4.4 Optimizing the Dimension Tables 17.3.4.5 Defining Junk Dimensions 17.3.4.6 Defining Outrigger Tables 17.3.4.7 Slowly Changing Dimensions 17.3.4.8 Rapidly Changing Dimensions
17.4 The Extraction, Transformation, and Loading (ETL) Process 17.5 Data Marts 17.6 Virtual Data Warehouses and Virtual Data Marts 17.7 Operational Data Store 17.8 Data Warehouses versus Data Lakes 17.9 Business Intelligence
17.9.1 Query and Reporting 17.9.2 Pivot Tables 17.9.3 On-Line Analytical Processing (OLAP)
17.9.3.1 MOLAP 17.9.3.2 ROLAP 17.9.3.3 HOLAP 17.9.3.4 OLAP Operators 17.9.3.5 OLAP Queries in SQL
Summary Problems and Exercises
18 Data Integration, Data Quality, and Data Governance
18.1 Data and Process Integration
18.1.1 Convergence of Analytical and Operational Data Needs 18.1.2 Data Integration and Data Integration Patterns
18.1.2.1 Data Consolidation: Extract, Transform, Load (ETL) 18.1.2.2 Data Federation: Enterprise Information Integration (EII) 18.1.2.3 Data Propagation: Enterprise Application Integration (EAI) 18.1.2.4 Data Propagation: Enterprise Data Replication (EDR) 18.1.2.5 Changed Data Capture (CDC), Near-Real-Time ETL, and Event Processing 18.1.2.6 Data Virtualization 18.1.2.7 Data as a Service and Data in the Cloud
18.1.3 Data Services and Data Flows in the Context of Data and Process Integration
18.1.3.1 Business Process Integration 18.1.3.2 Patterns for Managing Sequence Dependencies and Data Dependencies in Processes 18.1.3.3 A Unified View on Data and Process Integration
18.2 Searching Unstructured Data and Enterprise Search
18.2.1 Principles of Full-Text Search 18.2.2 Indexing Full-Text Documents 18.2.3 Web Search Engines 18.2.4 Enterprise Search
18.3 Data Quality and Master Data Management 18.4 Data Governance
18.4.1 Total Data Quality Management (TDQM) 18.4.2 Capability Maturity Model Integration (CMMI) 18.4.3 Data Management Body of Knowledge (DMBOK) 18.4.4 Control Objectives for Information and Related Technology (COBIT) 18.4.5 Information Technology Infrastructure Library
18.5 Outlook 18.6 Conclusion Problems and Exercises
19 Big Data
19.1 The 5 Vs of Big Data 19.2 Hadoop
19.2.1 History of Hadoop 19.2.2 The Hadoop Stack
19.2.2.1 The Hadoop Distributed File System 19.2.2.2 MapReduce 19.2.2.3 Yet Another Resource Negotiator
19.3 SQL on Hadoop
19.3.1 HBase: The First Database on Hadoop 19.3.2 Pig 19.3.3 Hive
19.4 Apache Spark
19.4.1 Spark Core 19.4.2 Spark SQL 19.4.3 MLlib, Spark Streaming, and GraphX
19.5 Conclusion Problems and Exercises
20 Analytics
20.1 The Analytics Process Model 20.2 Example Analytics Applications 20.3 Data Scientist Job Profile 20.4 Data Pre-Processing
20.4.1 Denormalizing Data for Analysis 20.4.2 Sampling 20.4.3 Exploratory Analysis 20.4.4 Missing Values 20.4.5 Outlier Detection and Handling
20.5 Types of Analytics
20.5.1 Predictive Analytics
20.5.1.1 Linear Regression 20.5.1.2 Logistic Regression
Logistic Regression Properties
20.5.1.3 Decision Trees
Splitting Decision Stopping Decision Decision Tree Properties Regression Trees
20.5.1.4 Other Predictive Analytics Techniques
20.5.2 Evaluating Predictive Models
20.5.2.1 Splitting Up the Dataset 20.5.2.2 Performance Measures for Classification Models 20.5.2.3 Performance Measures for Regression Models 20.5.2.4 Other Performance Measures for Predictive Analytical Models
20.5.3 Descriptive Analytics
20.5.3.1 Association Rules
Basic Setting Support, Confidence, and Lift Post-Processing Association Rules
20.5.3.2 Sequence Rules 20.5.3.3 Clustering
Hierarchical Clustering K-means Clustering
20.5.4 Social Network Analytics
20.5.4.1 Social Network Definitions 20.5.4.2 Social Network Metrics 20.5.4.3 Social Network Learning
20.6 Post-Processing of Analytical Models 20.7 Critical Success Factors for Analytical Models 20.8 Economic Perspective on Analytics
20.8.1 Total Cost of Ownership (TCO) 20.8.2 Return on Investment 20.8.3 In- versus Outsourcing 20.8.4 On-Premises versus Cloud Solutions 20.8.5 Open-Source versus Commercial Software
20.9 Improving the ROI of Analytics
20.9.1 New Sources of Data 20.9.2 Data Quality 20.9.3 Management Support 20.9.4 Organizational Aspects 20.9.5 Cross-Fertilization
20.10 Privacy and Security
20.10.1 Overall Considerations Regarding Privacy and Security 20.10.2 The RACI Matrix 20.10.3 Accessing Internal Data
20.10.3.1 Anonymization 20.10.3.2 SQL Views 20.10.3.3 Label-Based Access Control
20.10.4 Privacy Regulation
20.11 Conclusion Problems and Exercises
Appendix Using the Online Environment
How to Access the Online Environment Environment: Relational Databases and SQL Environment: MongoDB Environment: Neo4j and Cypher Environment: Tree Structure Visualizations Environment: HBase
Glossary Index Endorsements
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion