part0017

Amazon Redshift (Redshift) -- like RDS, Aurora, Neptune and DocumentDB -- is another managed, platform-based database service offered by AWS. Like RDS and Aurora, Redshift offers a relational database service, based on PostgreSQL, and appeals to apps with very large data sets (e.g., BIDW, OLAP, ETL, and AI). Like Aurora, Neptune and DocumentDB, Redshift enables clusters of database instances. Redshift clusters are composed of nodes rather than DB instances . A Redshift cluster consists of a leader node assisted by one or more compute nodes , rather than using the terminology for other DB clusters … a primary DB instance with one or more read replica DB instances .

Redshift compute nodes -- unlike Aurora, Neptune and DocumentDB -- do not serve as read replicas. Compute nodes enable massively parallel processing (MPP) crucial for optimizing query performance across very large partitioned data sets (i.e., table rows are distributed across compute nodes). A Redshift-specific SQL syntax enables data partitioning. Redshift’s CREATE TABLE statement features a DISTSTYLE parameter for specifying options for distributing a table’s rows (e.g., by designating a table column as the distribution key for horizontal partitioning).

Redshift accomplishes data partitioning by dividing each compute node’s memory and storage capacity into two or more slices . This enables additional parallelism carried out within each compute node. The leader node disseminates data to compute node slices and delegates work (i.e., code based on execution plans) to be performed by each compute node slice. The leader node coordinates and aggregates result sets from compute node slices.

Redshift achieves additional OLAP performance efficiencies by exploiting columnar data storage for database tables. OLAP queries often access a small number of columns at a time (e.g., in Dimension and Fact Tables for Star Schemas). Values for a single column for many rows are stored in each data block, rather than row-wise data blocks benefitting OLTP. Columnar data blocks take advantage of a common data type and compression scheme thereby decreasing space requirements and reducing I/O operations.

Redshift also offers node type options for using solid-state storage devices and -- like RDS and Aurora -- enables the purchase of reserved instances for compute nodes offering considerable cost savings compared to on-demand pricing.

Redshift clusters are similar to Aurora, Neptune and DocumentDB clusters -- but noteworthy relationship and attribute differences justify differentiating Redshift clusters from the other DB clusters. A Redshift cluster -- like an Aurora, Neptune and DocumentDB cluster, is a kind of DB cluster which in turn is a kind of cluster. A Redshift cluster inherits a KMS customer master key (i.e., a CMK for data encryption), IAM roles, and cluster snapshots because it is a kind of DB cluster. A Redshift cluster indirectly inherits (because a DB cluster is a kind of cluster) many security groups, a subnet group, a parameter group, an engine version, and is a source for event notifications. After a Redshift cluster is created, customary SQL client tools (i.e., tools compatible with PostgreSQL or other SQL client tools that are DBMS and platform independent) can be used to connect to the cluster database.