The design of a relational database is usually managed in several stages. The first stage is to identify the relational schema for the tables.
To clarify the process, consider the design of a small database of personally owned books. Here is a reasonable schema:
Authors(
id
, lastName, firstName, yob)
Books(title, edition, hardcover, publisher, pubYear,
isbn
, numPages)
Publishers(
id
, name, city, country, url)
Most of the fields specified here are text type (String
, in Java). The Authors.yob
(for year of birth), Books.pubYear
, and Books.numPages
fields should be integer type. The primary keys are Authors.id
, Books.isbn
, and Publishers.id
. A publisher's ID would be a short string, such as A-W for Addison-Wesley, or Wiley for John Wiley and Sons.
The next step is to identify the foreign key constraints. One is obvious: Books.publisher
references Publishers.id
.
But we can't use foreign keys to link the Authors
and Books
tables because their relationship is many-to-many: an author could write several books, and a book may have several authors. Foreign key relationships must be many-to-one. The solution is to add a link table that uses foreign keys to connect to these other two tables: AuthorsBooks(
author
,
book
)
.
This table has a two-field key: {author, book}
. This is necessary because neither field by itself will have unique values. For example, it might contain these three rows:
Author |
Book |
---|---|
JRHubb |
978-0-07-147698-0 |
JRHubb |
978-0-13-093374-0 |
AHuray |
978-0-13-093374-0 |
Table 5-3. Link table for Library database
Now we can define the remaining foreign keys for our schema:
AuthorsBooks.author
references Authors.id
AuthorsBooks.book
references Books.isbn
Our complete schema now has four tables, each with a primary key, and three foreign keys:
Authors(
id
, lastName, firstName, yob)
Books(title, edition, hardcover, publisher, pubYear,
isbn
, numPages)
Publishers(
id
, name, city, country, url)
AuthorsBooks(
author
,
book
)
Books.publisher
references Publishers.id
AuthorsBooks.author
references Authors.id
AuthorsBooks.book
references Books.isbn
This schema is illustrated in Figure 5-2. The primary keys are underlined, and the foreign keys are indicated by arrows.
Now we have decided upon our schema (Figure 5-2). It shows the four tables Authors
, Books
, Publishers
, and AuthorsBooks
. Each table lists its fields, with its primary key field(s) underlined. The foreign key constraints are shown by arrows. For example, Books.isbn
is a foreign key for the AuthorsBooks.book
field.
Figure 5-2. Schema for library database
The next step is to create the database. To do that, we must first choose a relational database system (RDBMS).
There are many good RDBMSs available. If you have access to a proprietary RDBMS provided by your school or employer, that might be your best choice. Among the most popular are Oracle RDB, Microsoft SQL Server, and IBM DB2. There are also several good, open-access (free) RDBMSs available. The most popular among them is MySQL, which is described in the Appendix.
Since we have been using the NetBeans IDE for our Java examples in this book, we will use its Java DB database system in this chapter. However, all the SQL commands shown in the listings here should run the same in MySQL, Oracle, SQL Server, or DB2. The NetBeans Java DB website (https://netbeans.org/kb/docs/ide/java-db.html) provides more information on the Java DB database system. The website NetBeans MySQL explains how to connect to the MySQL database system from NetBeans.
Database developers communicate with a RDBMS by means of SQL (Structured Query Language). Unlike programming languages such as Java and Python, SQL is primarily a declarative language. That means that it responds to individual commands, like the language R or OS commands. Later in the chapter, we will see how to use JDBC to execute SQL commands from within Java programs.
Figure 5-3. Starting Java DB
The RDBMS that comes with NetBeans is called Java DB. Follow these steps to create your Library
database:
Figure 5-4. Creating the database
Library
for Database Name, and then pick a username and a simple password that you can remember. If you want to also specify the location for your database, click on the Properties… button.Figure 5-5. Connected to the database
create table
command, exactly as shown in Listing 5-1. Notice the commas at the ends of the four lines 2-5, but not at the end of line 6 (in SQL, the comma is a separator, not a terminator). Also, in contrast to Java, SQL uses parentheses (lines 1 and 7) instead of braces to delimit the field definition list.Listing 5-1. SQL Create table command
Figure 5-6. Publishers table created
The SQL query language is big and rather complex. This is due partly to its age—it's almost 50 years old. It's also partly because the language has evolved through many iterations, heavily influenced by powerfully competitive RDBMS vendors: IBM, Microsoft, and Oracle. But fear not; we'll be using only a small part of the language in this book.
SQL commands are separated into two groups: those that are used to define a database, and those that are used to access its data. The first is called the Data Definition Language (DDL), and the second is called the Data Manipulation Language (DML). The create database
command used in step 4 previously, and the create table
command used in Listing 5-1, are DDL statements.
Here are some other common DDL commands:
alter table drop table create index
Here are a few common DML commands:
insert into update delete from
We use the terms command and statement interchangeably in SQL.
Many authors prefer to write SQL statements using all capital letters. But unlike Java and most other procedural languages, SQL is case insensitive—it ignores differences between uppercase and lowercase. Whether you use uppercase or lowercase is mainly just a question of style.
Continuing the creation of your Library
database, the next step is to execute the create table
command for your Books
table, as shown in Listing 5-2. This has some new features that we didn't have in the Publishers
table.
First, note that we are using the int
datatype (lines 3, 6, and 8). This is the same as the int
type in Java. In SQL, it can also be written as integer
.
Listing 5-2. Create books command
Secondly, in the declaration of the cover
field (line 4), we are using a check
clause to limit the possible values of that field to only 'HARD'
and 'SOFT'
. This is essentially a nominal datatype, comparable to an enumerated (enum
) type in Java.
Also note that in SQL, all strings are delimited by apostrophes (the single quote character) instead of regular quotation marks as in Java: write 'HARD'
not "HARD"
.
Finally, note the references
clause in line 5. This enforces the foreign key: Books.publisher
references Publishers.id
. Notice that both fields, Books.publisher
and Publishers.id
, have the same datatype: char(4)
. A foreign key should have the same datatype as the primary key that it references.
Your database structure should now look like the display in the following screenshot, but with your username in place of JHUBBARD:
Figure 5-7. Database structure
The SQL commands for creating the Authors
table and the AuthorsBooks
table are shown in Listing 5-3 and Listing 5-4:
Listing 5-3. Create Authors command
In Listing 5-4, the two required foreign keys are specified by references
clauses (lines 2 and 3). Also note that the two-field primary key
is specified separately, at line 4. Single-field primary keys (for example, Authors.id
) can be specified either way, as an appending clause or on a separate line. But multi-field keys must be specified separately this way.
Listing 5-4. Create AuthorsBooks
All four of these SQL commands have been combined into a single file, named CreateTables.sql
and shown in Listing 5-5. This is called an SQL script, which you can download from the Packt website.
Listing 5-5. SQL script creating all four tables
The script begins with a seven-line header comment that identifies the file by name and states its purpose, identifies the book, the author, and the date it was written. In SQL, anything that appears on a line after a double dash is ignored by the SQL interpreter as a comment.
The four tables must first be deleted before they can be recreated. This is done with the drop table
command at lines 9-12.
The tables must be dropped in the reverse order from which they were created. This accommodates the foreign key constraints. For example, the Books
table depends on the Publishers
table (see Figure 5-2), so it must be dropped before the Publishers
table. Of course, the order for the creations of the tables is the reverse, for the same reason: The Publishers
table must be created before the Books
table is.
Notice that this script contains eight SQL statements: four drop
and four create
. The statements must be separated by semicolons, although a single statement can be run without a semicolon.
To run the script in the NetBeans SQL editor, right-click anywhere in the window and select Run File.
Data is inserted into a database table by means of the insert
into
statement. This is illustrated in the following screenshot:
Listing 5-6. Inserting one row of data
To run this command, first open a new SQL editor window. Right-click on the database connection icon and select Execute Command…. Enter the SQL code as shown here. Then right-click anywhere in the window and select Run Statement.
To see that the data really has been inserted in to the table, right-click on the icon for the PUBLISHERS table (in the NetBeans Services tab) and select View Data…. An output window will appear:
Figure 5-8. View of data in Publishers table
Notice the label at the top of this output window: SELECT * FROM JHUBBARD.PU…. This is the first part of the complete query that was executed to produce this output. You can see that complete query in line 1 of the SQL editor window above it:
Figure 5-9. Query executed to produce output
There are two versions of the insert
into
statement. One version is illustrated in Listing 5-6. It requires a value for each field that is defined in the table. If any of those values is unknown or does not exist, then a more detailed version of the insert
into
statement is needed. This second version simply lists each field for which values are being provided.
The insert
statement executed in Listing 5-7 illustrates this second version:
Listing 5-7. Insert specifying field names
This publisher has no URL, so that field must be left empty. Notice that the output in Figure 5-10 shows that field value as <NULL>:
Figure 5-10. Current data in Publishers table
One advantage of this version is that you can list the fields in any order, as long as the order of their values matches.
Data can also be inserted into tables using insert
into
statements in batch mode. This simply means combining many of the statements together, one after the other, in a single SQL file, and then running the file itself, as we did with the create table
commands in Listing 5-5.
Run this batch file in the SQL Editor:
Listing 5-8. Batch mode insertions
Then check the results:
Figure 5-11. Publishers table
The only problem with this approach is the tedium of writing all those insert
into
statements into the file. We shall see a more elegant solution later using JDBC instead.
An SQL database query is an SQL statement that requests information from the database. If the query is successful, it results in a dataset that is usually presented in tabular form; that is, a virtual table. Figure 5-9 shows an SQL query that was generated automatically when we clicked on View Data… in the drop down menu on the Publishers
table. The key word to notice there is the first word, select.
The select
statement has more clauses and variations than any other statement in SQL. It is fairly well covered on the Wikipedia page. We shall look at just a few variations in this section.
The query in Listing 5-9 returns the name
, city
, and country
of those publishers who are not US companies, sorting the results by name
.
Listing 5-9. A query on the Publishers table
Both the where
clause and the order
by
clause are optional. The symbol <> means "not equal".
The query in Listing 5-10 shows that if you list two fields in an order
by
clause, the results will be sorted by the second field when tied in the first field.
Listing 5-10. Query ordered by two fields
Standard SQL has four categories of data types: numbers, character strings, bit strings, and temporal types.
The numeric types include int
, smallInt
, bigInt
, double
, and decimal(p,s)
, where p
is the precision (the maximum number of digits) and s
is the scale (the number of digits after the decimal point).
The character string types include char(n)
and varchar(n)
, as we have already seen.
The two bit string types are bit(n)
and bit varying(n)
. These can be interpreted as arrays of single bits—zeros and ones.
The temporal types include date
, time
, and timestamp
. A timestamp is an instance in time that specifies the date and time, such as 2017-04-29 14:06. Notice that the ISO standard is used for dates; for example, April 29, 2017 is denoted as 2017-04-29.
Because of the competition among the major RDMS vendors, many SQL types have alternative names. For example, integer
can be used in place of int
, and character
can be used in place of char
. MySQL uses int
and char
.
The Java Database Connectivity (JDBC) API is a library of Java packages and classes that provide easy execution of SQL commands on a RDB from within a Java program. This API is called a database driver. It can be downloaded from the provider of your DBMS, for example:
ojdbc6.jar
for Oracle Database 11gIf you are using Java DB in NetBeans, no download is necessary; JDBC is already installed. But the library does have to be added to your project.
To use JDBC in a NetBeans project, follow these steps to add it to your project:
Figure 5-12. Adding the JDBC library in NetBeans
To test your JDBC connection, run the simple test program, JDBCTester,
which is shown in Listing 5-11:
Listing 5-11. JDBC program access to a database
The import
statements on lines 8-11 can be inserted automatically when you write lines 20-21 (just click on the tiny red ball on the line number where the import is needed, and then select the import statement to be inserted). Notice that these objects, DriverManager
, Connection
, Statement
, and SQLException
are all instances of classes that are defined in the java.sql
package.
The three constants defined at lines 14-16 are used by the DriverManager
.getConnection()
method at line 20. The URL locates and identifies the database. The USR
and PWD
constants should be initialized with the username and password that you chose when you created the database. See Figure 5-3.
A Statement
object is instantiated at line 21. It will contain an SQL statement that can be executed against your database. That code will fit in the area where the blank line 22 is now.
The Java program in Listing 5-12 fills in the gap at line 22 of the program in Listing 5-11:
Listing 5-12. Program to print Publishers data
The SQL statement to be executed is defined as a String
object at line 20. Line 23 passes that string to the executeQuery()
method of our stmt
object, and then saves the results in a ResultSet
object.
A ResultSet
is a table-like object that provides more than 70 getter methods to access its contents. Since we asked for two fields (name and city) from the Publishers
table, and we know that it has six rows, we can conclude that this rs
object will act like a six by two array of Strings
. The while
loop at lines 26-30 will iterate down that array, one row at a time. At each row, we read the two fields separately, at lines 25-27, and then print them at line 28. The output is shown in Figure 5-13:
Figure 5-13. JDBC program output
In the previous example, we used a JDBC statement object to query the Library
database. We can also use the more flexible PreparedStatement
, which includes variables that can be assigned different values dynamically.
The program in Listing 5-13 shows how to insert new records into a database table. It reads the six lines of data from the CSV file shown in Figure 5-14, inserting the data in each line as a new record into the Authors
table of our Library
database:
Figure 5-14. An external data file
To read from the file, we instantiate a File
object at line 24 and a Scanner
object at line 28.
At line 25, the SQL insert
statement is coded into the sql
string using four question marks (?
) as place holders. These will be replaced by the actual data at lines 38-41. The PreparedStatement
object, ps
, is instantiated at line 28 using that sql
string.
Each iteration of the while
loop at lines 30-43 reads one line from the file. A separate scanner parses the line, reading the four CSV values at lines 33-37. Those four values are then pasted into the PreparedStatement
at lines 37-40. On the third iteration, they would be the strings "WirthNik"
, "Wirth"
, "Niklaus"
, and the integer 1934, forming the complete SQL statement:
insert into Authors values('WirthNik', 'Wirth', 'Niklaus', 1934)
This is then executed at line 41.
Since each execution of that PreparedStatement
changes only one row in the table, the rows
counter is incremented (by one) at line 41. So, the output from line 45 is:
6 rows inserted.
Listing 5-13. JDBC program to insert data
The program in Listing 5-13 suggests a general process, using JDBC PreparedStatement
to load the complete database from external files. The following examples show how to do that:
Figure 5-15. Data file for Publishers table
The data file shown in Figure 5-15 contains data for 12 complete records for our Publishers
table. The file has one record per line, separating the fields by the slash character (/
). The program in the following screenshot loads all that data into the Publishers
table:
Listing 5-14. Program to load data into the Publishers table
The PreparedStatement
is defined at line 24 as the constant string SQL
, and the data file is defined as the File
object DATA
at line 23.
At lines 31-33, we first use anonymous Statement
objects to execute delete
from
statements on the AuthorsBooks
, Books
, and Publishers
tables to empty them. We want the Publishers
table to be emptied so that it will eventually contain only those records that are in this data file. And we must first empty the AuthorsBooks
and Books
tables because of their foreign key constraints (see Figure 5-2). SQL will not allow a record to be deleted if it is referenced by another existing record.
The rest of the code in Listing 5-14 is like that in Listing 5-13, except for the processing of the url
field. As we can see in lines 1 and 3 of the data file (Figure 5-15), some Publishers
records do not have values for the url
field (those publishers went out of business before the internet was invented). In situations like this, the database table should have the NULL
value in those missing slots. In this case, we want the NULL
value of the varchar()
type. That is accomplished by the code at line 50:
ps.setNull(5, Types.VARCHAR);
We first use the conditional expression operator at line 44 to store the empty string in the url
variable if the end-of-line has been reached in parsing the data file. Then the if
statement at line 44 does the right thing, whether the url
exists or not.
Data for the Authors
table is shown in Figure 5-16:
Figure 5-16. Data for the Authors table
These 25 records can be loaded into the Authors
table using the program in Listing 5-15. It works the same way as the program in Listing 5-14.
The Packt website for this book has all these files and programs available for download. It also includes data files and programs for loading the other two tables (Books
and AuthorsBooks
) for the Library
database.
Listing 5-15. Program to load data into the Authors table
A database view is essentially a virtual table. It is created by embedding a select
statement within a create
view
statement. You can then apply select
queries to that view as though it were an actual table or result set.
The query shown in Listing 5-16 lists those authors who have published books in the US. The from
clause at line 2 lists all four tables because the query must link fields in the Authors
table to fields in the Publishers
table (see Figure 5-2). The three conditions at lines 3-5 correspond to the three foreign keys in the database. The order
by
clause at line 7 sorts the results alphabetically by last name, and the keyword distinct
at line 1 removes duplicate rows from the output.
Listing 5-16. Selecting American authors
The output from this (or any) query is a virtual table. It has the same structure as an ordinary database table, but its data is not stored here as a separate table. A virtual table stores only references to real data that is stored in real tables elsewhere. Nevertheless, the concept of the virtual table is so useful that SQL allows us to name it and then use it as we would a real table. This is called a database view.
The view named AmericanAuthors
is created in Listing 5-17. It simply prefaces the previous select statement with the code create
view
AmericanAuthors
as
in line 1.
Listing 5-17. Creating the View AmericanAuthors
You can see the resulting view object listing among the other database objects in the NetBeans Services tab window.
Figure 5-17. Database objects
Listing 5-18. Querying the AmericanAuthors view
Now you can select
from the AmericanAuthors
view the same way you would from any of the four real tables in the database. The query in Figure 5-18 finds all the authors in that view whose first name begins with the letter J
.
Notice the use of the SQL substring function in this query:
substr(firstName,1,1)
This returns the first letter of the firstName
string.
In general, the call
substr(string, n, length)
will return the substring of the specified string
of the specified length
, beginning with the nth character. But in SQL, counting begins with 1, not 0 as in Java, so the first character is character number 1.
Views are dynamic. If the contents of the tables from which the view was created change, then subsequent output from queries of that view will change accordingly.
For example, suppose we correct the first name of the author GERSTING
from Judith
to Judith
L.
That is done with the SQL update
command, as shown in Listing 5-19. Then, if we query our AmericanAuthors
view again, we see the change there.
Listing 5-19. Changing the Data
A view is a virtual table—it looks like a table, but it has no data itself. You can run select
statements on it, but you cannot run update
statements on a view. To change the data, you must update the table where the data resides.
Listing 5-20. Repeat query after changing data
A view is a database object that acts like a virtual table for queries. A subquery is essentially a temporary view that is defined within a query and exists only for that query. So, you can think of a subquery as a temporary virtual table.
The statement in Listing 5-21 creates a view named EuropeanBooks
that lists the isbn
codes of all the books that were published outside of the US.
Listing 5-21. Creating EuropeanBooks View
The statement in Listing 5-22 contains a subquery at lines 4-5 that will return the virtual table of isbns
of all those European books. Then the main select
statement, beginning on line 1, returns the authors of those books.
Listing 5-22. Authors of European books
Note that it is not necessary to use a view in the subquery. But it does make the query easier to understand. Like using a method in a Java program, the name of the method itself helps the code clarify what it is doing.
The query in Listing 5-23 illustrates two SQL features. It uses the aggregate function
avg()
, which returns the average of the value of the specified field, numPages
in this case. The query also specifies a label for the output: Average
, which appears (in all capital letters) at the top of that output column. The output is the single value 447
.
Listing 5-23. Average number of pages
There are five aggregate functions in standard SQL: avg()
, sum()
, count()
, max()
, and min()
.
The query in Listing 5-24 uses the previous code as a subquery, listing those books whose numPages
value is above average. It also illustrates two other SQL features. It shows how a clause can follow a subquery; in this case, the order
by
clause. It also shows how to sort the output in descending order using the desc
keyword.
Listing 5-24. Books with above average numbers of pages
A database table index is an object that is bound to a database table that facilitates searching that table.
Although the actual implementation of an index is usually proprietary and designed by the RDBMS vendor, it can be thought of as a separate file containing a multiway search tree that can locate keyed records in logarithmic time. This means that its lookup time is proportional to the logarithm of the number of records in the table. For example, if a lookup takes three probes into a table of 10,000 records, then it should be able to do a similar lookup into a table of 100,000,000 records in six probes (because log n2 = 2 log n).
Indexes take up extra space, and they take a bit of time to be updated when the table is changed by insertions or deletions. But if a large table is used mostly for lookups, then the database administrator is strongly advised to create an index for it.
A database index is usually implemented as a B-tree (invented by Rudolf Bayer and Ed McCreight when working at Boeing Research Labs in 1971), which stores key values and records addresses. The following figure illustrates this data structure:
Figure 5-18. B-tree data structure
This could be an index on a table whose key is a two-digit positive integer. To find the record with key value 94, for example, the search begins at the root of the tree, which is the node at the top. That node contains 61 and 82. Since the target value is greater than those values, the search continues down the link on the right of 82, to the node containing 90 and 96. The target is between those two values, so the search continues down the link that is between them, to the next node. That node contains 90 and 92, so the search follows the link on the right of 92 and finally reaches the leaf node that contains the address of the record with the target key value, 94.
The B-tree in Figure 5-18 has 13 key nodes containing a total of 31 keys at three levels, each node containing two four keys. But B-tree nodes can be configured to contain more than four nodes each. In fact, nodes with 20-40 keys each are not unusual. A B-tree with four levels of 40-key nodes could index a table holding 100,000,000,000 records. And yet, having only four levels means only four probes per search—that's nearly instantaneous!
The Java DB database system that comes with the NetBeans IDE automatically indexes each table that is created with a primary key. As with every object in a database, it has a name. As you can see in Figure 5-19, this index is named SQL170430152720920 in the author's version of the database:
Figure 5-19. Default index on Authors table
If you are using an RDBMS that does not automatically index its keyed tables, then you can create an index with the SQL create index
command:
create index AuthorsIndex on Authors(id)
This will create an index named AuthorsIndex
on the id
field of the Authors
table.