Lesson 1. Understanding SQL

In this lesson, you’ll learn exactly what SQL is and what it will do for you.

Database Basics

The fact that you are reading a book on SQL indicates that you, somehow, need to interact with databases. SQL is a language used to do just this, so before looking at SQL itself, it is important that you understand some basic concepts about databases and database technologies.

Whether you are aware of it or not, you use databases all the time. Each time you select a contact on your phone or a name from your email address book, you are using a database. If you conduct a Google search, you are using a database. When you log in to your network at work, you are validating your name and password against a database. Even when you use your ATM card at a cash machine, you are using databases for PIN verification and balance checking.

But even though we all use databases all the time, there remains much confusion over what exactly a database is. This is especially true because different people use the same database terms to mean different things. Therefore, a good place to start our study is with a list and explanation of the most important database terms.

Tip: Reviewing Basic Concepts

What follows is a very brief overview of some basic database concepts. It is intended to either jolt your memory if you already have some database experience, or to provide you with the absolute basics if you are new to databases. Understanding databases is an important part of mastering SQL, and you might want to find a good book on database fundamentals to brush up on the subject if needed.

Databases

The term database is used in many different ways, but for our purposes (and indeed, from SQL’s perspective) a database is a collection of data stored in some organized fashion. The simplest way to think of it is to imagine a database as a filing cabinet. The filing cabinet is simply a physical location to store data, regardless of what that data is or how it is organized.

New Term: Database

A container (usually a file or set of files) to store organized data.

Caution: Misuse Causes Confusion

People often use the term database to refer to the database software they are running. This is incorrect and a source of much confusion. Database software is actually called the Database Management System (or DBMS). The database is the container created and manipulated via the DBMS, and exactly what the database is and what form it takes vary from one database to the next.

Tables

When you store information in your filing cabinet, you don’t just toss it in a drawer. Rather, you create files within the filing cabinet, and then you file related data in specific files.

In the database world, that file is called a table. A table is a structured file that can store data of a specific type. A table might contain a list of customers, a product catalog, or any other list of information.

New Term: Table

A structured list of data of a specific type.

The key here is that the data stored in the table is one type of data or one list. You would never store a list of customers and a list of orders in the same database table. Doing so would make subsequent retrieval and access difficult. Rather, you’d create two tables, one for each list.

Every table in a database has a name that identifies it. That name is always unique—meaning no other table in that database can have the same name.

Note: Table Names

What makes a table name unique is actually a combination of several things including the database name and table name. Some databases also use the name of the database owner as part of the unique name. This means that while you cannot use the same table name twice in the same database, you definitely can reuse table names in different databases.

Tables have characteristics and properties that define how data is stored in them. These include information about what data may be stored, how it is broken up, how individual pieces of information are named, and much more. This set of information that describes a table is known as a schema, and schemas are used to describe specific tables within a database, as well as entire databases (and the relationship between tables in them, if any).

New Term: Schema

Information about database and table layout and properties.

Columns and Datatypes

Tables are made up of columns. A column contains a particular piece of information within a table.

New Term: Column

A single field in a table. All tables are made up of one or more columns.

The best way to understand this is to envision database tables as grids, somewhat like spreadsheets. Each column in the grid contains a particular piece of information. In a customer table, for example, one column contains the customer number, another contains the customer name, and the address, city, state, and ZIP code are all stored in their own columns.

Tip: Breaking Up Data

It is extremely important to break data into multiple columns correctly. For example, city, state, and ZIP (or postal) code should always be separate columns. When you break these out, it becomes possible to sort or filter data by specific columns (for example, to find all customers in a particular state or in a particular city). If city and state are combined into one column, it would be extremely difficult to sort or filter by state.

When you break up data, the level of granularity is up to you and your specific requirements. For example, addresses are typically stored with the house number and street name together. This is fine, unless you might one day need to sort data by street name, in which case splitting house number and street name would be preferable.

Each column in a database has an associated datatype. A datatype defines what type of data the column can contain. For example, if the column were to contain a number (perhaps the number of items in an order), the datatype would be a numeric datatype. If the column were to contain dates, text, notes, currency amounts, and so on, the appropriate datatype would be used to specify this.

New Term: Datatype

A type of allowed data. Every table column has an associated datatype that restricts (or allows) specific data in that column.

Datatypes restrict the type of data that can be stored in a column (for example, preventing the entry of alphabetical characters into a numeric field). Datatypes also help sort data correctly and play an important role in optimizing disk usage. As such, special attention must be given to picking the right datatype when tables are created.

Caution: Datatype Compatibility

Datatypes and their names are one of the primary sources of SQL incompatibility. While most basic datatypes are supported consistently, many more advanced datatypes are not. And worse, occasionally you’ll find that the same datatype is referred to by different names in different DBMSs. There is not much you can do about this, but it is important to keep in mind when you create table schemas.

Rows

Data in a table is stored in rows; each record saved is stored in its own row. Again, envisioning a table as a spreadsheet style grid, the vertical columns in the grid are the table columns, and the horizontal rows are the table rows.

For example, a customers table might store one customer per row. The number of rows in the table is the number of records in it.

New Term: Row

A record in a table.

Note: Records or Rows?

You may hear users refer to database records when referring to rows. For the most part the two terms are used interchangeably, but row is technically the correct term.

Primary Keys

Every row in a table should have some column (or set of columns) that uniquely identifies it. A table containing customers might use a customer number column for this purpose, whereas a table containing orders might use the order ID. An employee list table might use an employee ID. A table containing a list of books might use the ISBN for this purpose.

New Term: Primary key

A column (or set of columns) whose values uniquely identify every row in a table.

This column (or set of columns) that uniquely identifies each row in a table is called a primary key. The primary key is used to refer to a specific row. Without a primary key, updating or deleting specific rows in a table becomes extremely difficult as there is no guaranteed safe way to refer to just the rows to be affected.

Tip: Always Define Primary Keys

Although primary keys are not actually required, most database designers ensure that every table they create has a primary key so that future data manipulation is possible and manageable.

Any column in a table can be defined as the primary key, as long as it meets the following conditions:

Image No two rows can have the same primary key value.

Image Every row must have a value in the primary key column(s). (So, no NULL values.)

Image Values in primary key columns should never be modified or updated.

Image Primary key values should never be reused. (If a row is deleted from the table, its primary key may not be assigned to any new rows in the future.)

Primary keys are usually defined on a single column within a table. But this is not required, and multiple columns may be used together as a primary key. When multiple columns are used, the rules listed above must apply to all columns, and the values of all columns together must be unique (individual columns need not have unique values).

There is another very important type of key called a foreign key, but I’ll get to that later on in Lesson 12, “Joining Tables.”

What Is SQL?

SQL (pronounced as the letters S-Q-L or as sequel) is an abbreviation for Structured Query Language. SQL is a language designed specifically for communicating with databases.

Unlike other languages (spoken languages like English, or programming languages like Java, C, or Python), SQL is made up of very few words. This is deliberate. SQL is designed to do one thing and do it well—provide you with a simple and efficient way to read and write data from a database.

What are the advantages of SQL?

Image SQL is not a proprietary language used by specific database vendors. Almost every major DBMS supports SQL, so learning this one language will enable you to interact with just about every database you’ll run into.

Image SQL is easy to learn. The statements are all made up of descriptive English words, and there aren’t that many of them.

Image Despite its apparent simplicity, SQL is a very powerful language, and by cleverly using and combining its language elements, you can perform very complex and sophisticated database operations.

And with that, let’s learn SQL.

Note: SQL Extensions

Many DBMS vendors have extended their support for SQL by adding statements or instructions to the language. The purpose of these extensions is to provide additional functionality or simplified ways to perform specific operations. And while often extremely useful, these extensions tend to be very DBMS specific, and they are rarely supported by more than a single vendor.

Standard SQL is governed by the ANSI standards committee, and is thus called ANSI SQL. All major DBMSs, even those with their own extensions, support ANSI SQL. Individual implementations have their own names (PL-SQL, used by Oracle; Transact-SQL, used by Microsoft SQL Server; and so on).

For the most part, the SQL taught in this book is ANSI SQL. On the odd occasion where DBMS-specific SQL is used, it is so noted.

Try It Yourself

As with any language, the best way to learn SQL is to try it for yourself. To do this, you’ll need a database and an application with which to test your SQL statements.

All of the lessons in this book use real SQL statements and real database tables, and you should have access to a DBMS to follow along.

Tip: Which DBMS Should You Use?

You need access to a DBMS to follow along. But which should you use?

The good news is that the SQL you’ll learn in this book is relevant to every major DBMS. As such, your choice of DBMS should primarily be based on convenience and simplicity.

There are basically two ways to proceed. You can install a DBMS (and supporting client software) on your own computer; this will give you the greatest access and control. But for many, the trickiest part of getting started learning SQL is actually getting a DBMS installed and configured. The alternative is to access a remote (or cloud-based) DBMS; this way you have nothing to manage and install.

You have lots of options if you decide to install your own DBMS. Here are a couple of suggestions:

Image MySQL (or its spin-off MariaDB) is a really good choice in that it is free, supported on every major operating system, is easy to install, and is one of the most popular DBMSs in use. MySQL comes with a command-line tool for actually entering your SQL, but you are better using the optional MySQL Workbench, so download that, too (it’s usually a separate install).

Image Windows users may want to use Microsoft SQL Server Express. This free version of the popular and powerful SQL Server includes a user-friendly client named SQL Server Management Studio.

The alternative is to use a remote (or cloud-based) DBMS:

Image If you are learning SQL to use at work, your employer may have a DBMS that you can use. If this is an option, you’ll likely be given your own DBMS login and a tool to use to connect to the DBMS to enter and test your SQL.

Image Cloud-based DBMSs are instances of DBMSs running on virtual servers, effectively giving you the benefits of your own DBMS without having to actually install one locally. All of the major cloud service vendors (including Google, Amazon, and Microsoft) offer DBMSs in the cloud. Unfortunately, at the time of this book’s writing, setting these up (including configuring secure remote access) isn’t trivial and is often more work than installing your own DBMS locally. The exceptions are Oracle’s Live SQL and IBM’s Db2 on Cloud, which offer a free version that includes a web interface. Just type your SQL in the web browser, and you’re good to go.

You’ll find links to all the options mentioned here on the book’s web page, and as DBMS options evolve that page will be updated with tips and suggestions.

Once you have access to a DBMS, Appendix A, “Sample Table Scripts,” explains what the example tables are and provides details on how to obtain (or create) them so that can may follow along with the instructions in each lesson.

In addition, starting in Lesson 2 you’ll find Challenges after the “Summary” section. They present you with the opportunity to take your newly acquired SQL knowledge and apply it to solve problems not explicitly mentioned in the lessons. To verify your solutions (or if you get stuck and need some help), visit the book’s web page.

Summary

In this first lesson, you learned what SQL is and why it is useful. Because SQL is used to interact with databases, you also reviewed some basic database terminology.