part0003

Preface

Why Is This Book Needed?

Once upon a time, during those thrilling yesteryears of information technology history (circa the late 1960s and early 70s, when “Information Technology” was called “Electronic Data Processing” abbreviated “EDP” -- not “IT”), an IBM computer scientist, Edgar F. “Ted” Codd , conceived and published his theories on the relational model for database management.1 Upon his pioneering shoulders have stood many IT professionals, authors and entrepreneurs, who consequently evolved civilization’s contemporary relational database management systems (RDBMS) along with a thriving, competitive, multi-billion dollar worldwide marketplace. Among them were Raymond F. Boyce and Donald D. Chamberlin (also IBM computer scientists) who co-developed SQL (the Structured Query Language) which itself has evolved to become the most familiar and frequently used language for accessing and manipulating RDBMS data.2,3

Many years of research, product innovations, versions and releases have blessed humanity with a variety of robust, modern-day, commercially available, SQL-based relational database products (e.g., Oracle DB, Microsoft SQL Server, IBM Db2, etc.). Relational databases are now the foundation of many mission critical, enterprise scale applications such as ERP (Enterprise Resource Planning), CRM (Customer Relationship Management), and SCM (Supply Chain Management) systems -- driving online transaction processing (OLTP), analytical processing (OLAP) and other operational databases for countless customers world-wide.

And you -- the curious but skeptical reader -- are likely a veteran IT data professional possibly with substantial relational database background and experience. Your successful career, having yielded uninterrupted paychecks, has focused on the mainstream of IT with its apparent fixation on relational databases -- and likely you’ve productively contributed to, and lucratively benefitted from, a variety of IT projects, such as:

Implementation of commercial applications using an RDBMS
Development of custom OLTP applications using SQL for accessing an RDBMS
Implementation of OLAP, Business Intelligence (BI), or Data Warehouse (DW) applications using an RDBMS
Administering an RDBMS -- or managing or collaborating with RDBMS database administrators (DBAs)
Development of data models or database designs for RDBMS applications

You, an unrelenting student of IT, owe much of your professional success to your unquenchable thirst for knowledge. You’ve acquired and maintained your knowledge of many of the above mainstream subject areas. Powered by the confidence of that knowledge, you have become the “go to” gal or guy for pontificating opinions on many data management challenges facing your enterprise. As a result, perhaps you are (or will someday ascend as) the CDO -- the Chief Data Officer -- for your enterprise, flaunting your RDBMS subject matter expertise. The data that defines the business world, from your perspective (IT life as you know it), rests by and large on relational database technology, your professional and technical comfort zone.

But your conscience, that inner silent voice, cautions … “You can never rest on your knowledge laurels and stop studying IT” … because …

“Once you stop learning, you start dying” (Albert Einstein)

and …

“If you rest, you rust” (Helen Hayes).

Inspired by a growing number of interactions with other IT professionals …

Colleagues returning from cloud conferences with unorthodox ideas about storing data as documents without table structures,
App developers (young, recent-hire upstarts … whippersnappers) hyping MongoDB and other (gasp!) NoSQL solutions,
IT friends employed at other enterprises … spouting about their forced migrations to the cloud and their acquaintance with new cloud database services,

… you begin to suspect that times are changing for the relational status quo … that there’s now much beyond IT data life than relational stuff. They toss around numerous, non-relational concepts, buzzwords, and acronyms:

Graph databases, document databases, NoSQL query languages, GQLs, in-memory databases, cache server clusters, cloud database platform services, MPP, immutable ledger databases, blockchain technology, etc.

You are also overhearing scuttlebutt about new use cases and applications that appeal to non-relational solutions, and that new non-relational cloud data services are now ready for prime time … ready for serious consideration:

Graph databases are now commonplace for building Recommender Engines for social media and on-line retail suggestions. The gossip? Such apps with “highly connected” data sets and complex bill-of-material relationships have much better response times and are easier to implement with graph databases and their allegedly more intuitive NoSQL graph query languages.
New cloud database services deliver astonishingly fast sub-second latency for highly available, global apps (e.g., online retail, gaming, etc.) and readily support tens of thousands of concurrent users -- using NoSQL in-memory databases, data partitioning, cache clusters, and columnar storage.
“Agile” web app developers are lobbying for a shift to document databases to facilitate app development … “more intuitive and more scalable” they say, than complex relational multi-table joins.
C-level executives (CxOs) with cyber-security nightmares about threats to data confidentiality, integrity and availability -- espionage of IP, PII, PCI, and HIPAA data … ransomware … denial of service attacks … fraud … sabotage. Can the implied immutability of new ledger databases and blockchain technology mitigate such threats?

No worries … change is a good thing … you’re optimistic! You’ve grown (and not groaned) through many IT paradigm shifts throughout your career (i.e., many OS, ERP, and RDBMS mergers, acquisitions, and migrations). Now a new paradigm … a new challenge …

from Relational to Non-Relational!

You envision that these new concepts and use cases can be exploited as opportunities to amass additional IT knowledge, wield intellectual power, speak with professional authority beyond your RDBMS comfort zone … and you can emerge as a renowned non-relational hero! So as always, once again, you will be open-minded to these non-relational approaches now available in the cloud.

Fortunately, your enterprise is likely already familiar with moving apps to the Amazon cloud (AWS), having either begun the planning process or actually transitioned pieces of your apps to the cloud (i.e., the low hanging fruit -- the web servers and app servers that “reach back” to databases in your private data centers). But now your insatiable CxOs want more cost savings and additional data center footprint reduction. They now want your database servers moved to the cloud! They also want to understand these new non-relational database services as well! Why? Because of new “business imperatives” for competitive advantage: Faster response times for a new global shopping cart app, developing prototype recommender systems, faster time-to-market for apps development, cyber-security and data immutability … blah, blah, blah. And they want you to deliver presentations, plans and proposals for all of the above … pronto!

So you, like many IT professionals world-wide, are currently confronted with the challenge of quickly absorbing basic concepts surrounding the ever growing variety of AWS database services. You will need to do some cramming! As before, because of traditionally tight training budgets, you once again proactively google “AWS documentation” this time for unravelling AWS database services. You are shocked by many user guides, developer guides, and programming reference documents -- over 20 documents -- covering an impressive array of services. You hunger for a high-level summary -- with diagrams and pictures worth thousands of words -- that you can rapidly reuse for promptly producing your presentation. Alas, you quickly conclude that none such synopsis exists.

Introductory videos stun you with the robust collection of AWS database services offering many options for migrating existing relational databases, creating non-relational databases, and many new data concepts beyond your relational security blanket. You also astutely absorb from the user guides that AWS database services are described by a plush “Tower-of-Babel” vocabulary of many new fundamental concepts -- far deeper than water cooler and happy-hour discussions with IT colleagues:

Cluster Nodes, DB Instances, Event Subscriptions, Option Groups, Parameter Groups, Replication Groups, Node Groups, Ledger Databases, Merkle Trees, etc.

Along with basic cloud infrastructure concepts:

Regions, Availability Zones, Accounts, Virtual Private Clouds, Subnets, Instances, Security Groups, Subnet Groups, Resources, etc.

Their esoteric terminology and concepts are inextricably intertwined by many relationships and described by thousands of pages of detailed, technical prose, supported by many intuitive diagrams. Yet, nowhere, within this vast compendium of documentation can you, a veteran shepherd of data, find the one thing that would allow you to develop an instinct for how all of these pieces fit together. You suspiciously wonder …

Is there no data model ?

What’s a data model? As a data professional, you understand that a data model is primarily used for designing databases. But a conceptual data model can serve as a mind map for visually understanding basic concepts (entities) and relationships between entities for any complex subject area -- not only a database. Your data modeling intuition kicks in -- you pull out your trusty yellow lined legal pad and you begin to doodle these newly discovered AWS database concepts as entities. Separate boxes indicate Clusters, DB Instances, Parameter Groups, Parameters, Option Groups, Replication Groups, Node Groups, and many more concepts -- drawing lines in between to represent the complex relationships connecting these entities. But then, you ponder to yourself …

“Where’s the data model for each service?”
“Is there a summary data model comparing and contrasting all of the various AWS database services?”

You quickly skim through all of the user guides. No data model found! You peruse other related docs … again none found. How about general web searches … image searches … again none found! Sadly, it appears that no persuasive, compelling data model has yet been proposed or published for AWS database services. You will need to slog through thousands of pages. You’d rather tolerate a root canal … a colonoscopy … or a frontal lobotomy!

Unmistakably, your IT instincts suggest to you that the cycle time for achieving a confident overall understanding of AWS database services can be substantially squeezed with the help of a conceptual data model summarizing their many concepts and interrelationships. A concise data model can easily clarify your confusion, dissipate the daze, and diminish the duration for coming up to speed with AWS database services.

So here, to the best of this author’s knowledge, is likely the first attempt at publishing a conceptual data model for AWS database services. Let this be a testimony to the hope and trust that data modeling can possibly serve as a fresh approach for unravelling the knots of any obscure, nebulous, and highly technical subject area. The venture of attempting to quickly comprehend and contrast AWS database services is indeed one such thorny challenge. Herein you will enjoy numerous, and hopefully intuitive, entity-relationship diagrams; the result of an earnest effort to demystify the complexities of AWS database services.

This endeavor is a consequence of reverse engineering and visually interpreting the narrative found within the library of AWS Database and EC2 documentation ,4 at the time of this publication, specifically:

Amazon Relational Database Service, API Reference (API Version 2014-10-31)
Amazon Aurora, User Guide for Aurora (API Version 2014-10-31)
Amazon Neptune, User Guide (API Version 2017-11-29)
Amazon DocumentDB, Developer Guide (API version: 2014-10-31)
Amazon Redshift, API Reference (API Version 2012-12-01)
Amazon DynamoDB, Developer Guide (API Version 2012-08-10)
Amazon ElastiCache, API Reference (API Version 2015-02-02)
Amazon ElastiCache for Redis, ElastiCache for Redis User Guide (API Version 2015-02-02)
Amazon ElastiCache, ElastiCache for Memcached User Guide (API Version 2015-02-02)
Amazon Elastic Compute Cloud User Guide for Linux Instance s ( 2016)
Amazon Elastic Compute Cloud, API Reference (API Version 2016-11-15)
Amazon Quantum Ledger Database (Amazon QLDB): Developer Guide (API version: 2019-01-02, Latest documentation update: September 10, 2019)

Your feedback, suggestions, and improvement concepts, via social media or otherwise, are eagerly welcomed; and in conjunction with any major updates to the aforementioned references combined with God’s willingness, will serve as the basis for future editions.

Who Should Read This Book?

Many IT professionals from various walks of IT life will benefit from the content herein:

Developers of Relational Applications and Services -- whose comfort zone is design, implementation, deployment and maintenance of applications implemented using SQL-based, relational database management systems -- and who are now faced with the challenge of migrating existing relational databases to the cloud or creating new relational databases in the cloud -- and need to quickly grasp the basics of AWS database services providing relational database options.

Developers of Non-Relational Applications and Services -- whose comfort zone is design, implementation, deployment and maintenance of applications implemented using non-relational approaches (NoSQL, graph databases, document databases, key-value stores, in-memory databases, in-memory caching) -- and who are now faced with the challenge of migrating their applications to the cloud or creating new non-relational databases in the cloud -- and need to quickly comprehend the basics of AWS database services providing non-relational database options.

Enterprise Architects -- especially Data Architects but also Application Architects, Technology Architects, and Business Architects who have been challenged by the prospect of database cloud migration, design of new relational or non-relational applications (using NoSQL, graph databases, document databases, in-memory databases, in-memory caching) and who will be called upon for technical presentations, opinions, demos, or participation in a proof-of-concept.

Database Educators/Instructors/Teachers -- especially those anticipating a need for developing and delivering courseware focused on cloud-based database services. The content herein provides a unique and visual approach as a jump-start launching point for inspiring and quickly enabling presentation material for introducing, summarizing, comparing, and contrasting contemporary cloud-based data services.

Data Management Professionals interested in learning how to apply conceptual data modeling techniques as an approach to describing any highly complex, technical subject matter, not just for describing or designing familiar, commonplace information systems.

IT Professionals Acting in Support Roles -- other than IT applications and services engineering (e.g., operations, application orchestration, continuous integration and delivery, etc.), who will need to support migrated cloud apps and databases and, therefore, also need to quickly absorb AWS database services concepts.

Any IT professional Seeking an Introductory Data Modeling Tutorial explaining diagram syntax and semantics, along with many robust examples of data modeling diagrams.

Database Administrators (DBAs) with limited conceptual data modeling experience, appreciation or confidence, who wish to expand their skills and responsibilities to exploit conceptual data modeling as a method for formalizing requirements, project scoping, and effective communication among all project stakeholders.

Any IT professional who is currently or formerly conversant with one or more AWS database services but would like to review these conceptual models for rapidly refreshing, comparing and contrasting their comprehension of AWS database services.

Anyone with curiosity for considering an example of how a conceptual data model can be used to graphically portray information requirements and concept structure of virtually any perplexing discipline or area of interest, e.g., Cyber-security, Machine Learning, Internet of Things, Blockchain technology, Linear Algebra, Particle Physics, Organic Chemistry, ad nauseam.

What You Will Find

This document will take you deep down the “rabbit hole” into the wonderland of Amazon Database Services -- for a tour of their fundamental concepts and related details, visually supplemented by many intuitive data modeling diagrams. Here’s a brief summary of what you will find:

Introduction: The tour begins by providing answers to elementary but crucial questions: What are the Amazon Database Services and Amazon EC2? What is a Data Model? What is a Conceptual Data Model? You will appreciate that AWS database Services are cloud platform services (Platform as a Service -- PaaS). Amazon EC2 provides Infrastructure as a Service (IaaS, the foundation upon which all of the Amazon Database Services are built) whereby a customer account dynamically provisions infrastructure components (e.g., virtual servers, firewalls, etc.). A data model can be used for visually describing any subject area’s essential concepts and relationships. The state-of-the-practice for accomplishing such is to create a specific type of data model, commonly called a conceptual data model , which provides a high-level understanding of essential concepts and relationships. Chapters 2 through 9 will provide a conceptual data model for each of the Amazon Database Services -- illustrating basic and critical entities (i.e., essential concepts) included on many entity-relationship diagrams (ERDs). Created using the ER/Studio Data Architect tool (a product of IDERA, Inc.), ERDs depict a common data modeling diagramming style known as Information Engineering (IE) notation.

Chapter 1 – Amazon EC2 Basics: Primitive EC2 concepts (Regions, Availability Zones, Images, Accounts, Instances, Instance Types, VPCs, Subnets, Security Groups, VPC Instances, Classic Instances) are defined and illustrated via simple ERDs. These initial ERDs serve as the basis for providing a tutorial on data modeling concepts and Information Engineering notation: entities, relationships, attributes, Crow’s Feet Notation, one-to-many relationships, many-to-many relationships, dependent entities, optional vs. mandatory relationships, identifying vs. non-identifying relationships, foreign keys, primary keys, super-types and sub-types.5

Chapter 2 – Amazon Relational Database Service (RDS): The deep dive into AWS database services concepts begins with RDS, a managed web service for implementing familiar relational database management systems (i.e., Oracle DB, Microsoft SQL Server, MySQL, MariaDB, and PostgreSQL) … including definitions and supporting ERDs illustrating concepts such as Database Instances, Option Groups, Options, Event Notification, Reserved Database Instances, Database Backups, Database Logs and Log Types.

Chapter 3 – Amazon Aurora: Aurora is a managed service supporting clusters of database servers for familiar open source RDBMS engines (MySQL and PostgreSQL) … including definitions and supporting ERDs illustrating concepts such as Aurora clusters, a primary DB instance, Read Replica database instances, virtual cluster volumes, DB cluster snapshots, cross-region DB clusters, backtracking, and Serverless clusters.

Chapter 4 – Amazon Neptune : Amazon Neptune is a managed AWS database service supporting graph database engines accessed by NoSQL graph query languages (GQLs). Other topics include graph data structures, highly connected datasets, Gremlin and SPARQL GQLs, recommender engines, labelled property graphs, vertices, edges, and properties. Supporting ERDs illustrate concepts such as Neptune clusters compared and contrasted with Aurora clusters.

Chapter 5 – Amazon DocumentDB : DocumentDB is another managed database service offered by AWS, providing clusters of document database servers and compatibility with MongoDB. DocumentDB is also categorized as a NoSQL database service. Basic document concepts (e.g., collections, documents, fields, embedded documents, etc.) are introduced along with MongoDB operations for creating, reading, updating, and deleting documents. Supporting ERDs illustrate DocumentDB clusters in contrast with Neptune and Aurora clusters.

Chapter 6 – Amazon Redshift: Redshift, based on PostgreSQL, is another managed data service offered by AWS, and targets applications requiring efficient access to very large data sets (e.g., BIDW, OLAP, ETL, etc.). Redshift also supports clusters of database nodes (rather than DB instances). Redshift basic concepts -- leader nodes, compute nodes, massively parallel processing (MPP), partitioned data sets, slices, columnar storage, table restore requests, etc. -- are described. ERDs illustrate Redshift clusters contrasted with DocumentDB, Neptune and Aurora clusters.

Chapter 7 – Amazon DynamoDB: DynamoDB is another NoSQL, managed AWS database service supporting structured and semi-structured data stored as key-value pairs and documents (JSON files). DynamoDB is serverless and ideally for implementing global, internet scale apps (e.g., shopping carts, gaming, bidding, etc.). DynamoDB basic concepts are described: tables, global tables, global/local indexes, auto scaling, streams, DynamoDB Accelerator (DAX), cache clusters, caching strategies, DAX clusters, DAX nodes, etc., along with supporting ERDs.

Chapter 8 – Amazon ElastiCache: ElastiCache is another AWS managed caching service for implementing scalable clusters of cache nodes supporting both Memcached and Redis cache engines. ElastiCache basic concepts are described (e.g., caching strategies, lazy loading, data structure servers, Redis commands, Memcached commands, ElastiCache clusters, Redis standalone clusters, Redis replication groups, snapshots, reserved instances, node type parameters, etc.) along with supporting ERDs.

Chapter 9 – Amazon Quantum Ledger Database (QLDB): QLDB, a managed AWS database service, is intriguingly different from previously described AWS database services. QLDB focuses on apps requiring a ledger database, a repository for System of Record (SOR) apps requiring complete transaction history. A QLDB ledger blends a smorgasbord of concepts from relational, document, and blockchain paradigms. The following QLDB basic concepts, summarized with supporting ERDs, are described: SQL access to tables of documents (via PartiQL, a SQL-like language); Amazon Ion format (a superset of JSON); QLDB’s built-in journal of change history; Blockchain concepts (i.e., the journal is an append only, immutable chain of blocks); Verifiability of data integrity (i.e., via SHA-256 hash codes, Merkle Trees, and Merkle Audit Proofs).

Appendix : Summary ERDs for each of Chapters 1 through 9 are replicated, as a matter of convenience, enabling periodic and relatively rapid review of AWS database services and EC2 essential concepts and relationships.

About The Author

Henry M. Nirsberger, also the author of A Conceptual Data Model for Amazon EC2 , earned an MS in Systems & Information Science at Syracuse University and BA in Mathematics, Le Moyne College. His professional IT career spans five decades starting as a software engineer with Honeywell Information System in 1974, continuing with GE in 1979. While at GE, he assumed a variety of technical, project management, and management positions, culminating as a Senior Architect for GE’s Corporate Enterprise Architecture Team, having gratefully contributed to hundreds of IT systems and applications development projects. Currently the CEO of HMN Consulting LLC, specializing in enterprise architecture and data management consulting services, his past professional certifications include CDMP (DAMA), CBIP (TDWI), CFPIM (APICS 1984–2003) and TOGAF 9. As a trained facilitator (MGR Consulting), he facilitated over 600 IT design and planning sessions for data modeling, process modeling, database design, project planning, process improvement, requirements consensus and team building. Presently, he remains a student of data modeling, cloud computing, enterprise architecture, cyber-security, and all aspects of data management.

Acknowledgments

The author gratefully appreciates and acknowledges the efforts of the editors and reviewers of this endeavor. Editing services were adeptly delivered by Bernard E. Durfee (Lambda School), William Rielly (Principal Analytics Engineer, CDPHP), Jennifer K. Senich (VP Operations, Ithos), and Addie Nirsberger (the author’s perfect 10 wife, best friend, and soul mate). The author also salutes Bernard E. Durfee for inspiration and ideas used for portions of narrative in the introduction chapter. The book cover was designed by Bob Sacca, one of the author’s all-time favorite GE managers. The author is also grateful to Joy Ruff (Product Marketing Manager, IDERA) for exemplary customer support and providing an evaluation license of IDERA ER/Studio Data Architect from which all entity-relationship diagrams herein were created. The author also recognizes the outstanding support and infinite patience demonstrated by the Amazon AWS Support Center, whose technical support staff rapidly responded and proficiently resolved 79 cases for questions on all Amazon database services raised by the author over a period of several months.

Dedication

This effort is dedicated to Mom and Dad … Anna and Michael … intrepid immigrants to America … in search of freedom and opportunity. The memory of your love, laughter, values, determination, and encouragement for continuous learning inspire me each day.

Henry M. Nirsberger

Clifton Park, New York, USA

Henry.Nirsberger@gmail.com