Chapter 8. Updating and Deleting Data

Data in databases will change often. There’s always something to change, some bit of information to add, some record to delete. For these situations in which you want to change or add pieces of data, you will mostly use the UPDATE statement. For situations in which you want to delete an entire row of data, you’ll primarily use the DELETE statement. Both of these SQL statements are covered extensively in this chapter.

Updating Data

The UPDATE statement changes the data in particular columns of existing records. The basic syntax is the UPDATE keyword followed by the table name, then a SET clause. Generally you add a WHERE clause so as not to update all of the data in a given table. Here is a generic example of this SQL statement:

UPDATE table
SET column = value, ... ;

This syntax is similar to the emphatic version of the INSERT statement, which also uses the SET clause. There isn’t a less emphatic syntax for UPDATE, as there is with INSERT. An important distinction is that there is no INTO clause. Instead, the name of the affected table is just given immediately after the UPDATE keyword.

Let’s look at an example of the UPDATE statement. In Chapter 5, we created a database called birdwatchers and a table within it called humans that would contain data about people who watch birds and use the rookery site. We then entered information on some of those people. In one of the exercises at the end of Chapter 5, we added a column (country_id) which contains the country code where the member resides. Suppose that of the few members that we’ve entered already in the table, all of them live in the United States. We could set the default value for the country_id column to us, but we’re expecting most of our members to be in a few countries of Europe. For now, we just want to update all of the rows in the humans table to set the country_id to us. Execute an UPDATE statement like this:

UPDATE birdwatchers.humans
SET country_id = 'us';

This statement will set the value for the country_id for all of the rows in the table. All of them had a NULL value before this, but if they had some other value—a different country code—those values would be changed to us. That’s a very broad and comprehensive action. Once you do this, there’s generally no way to undo it—unless you do so in an InnoDB table and do it as part of a transaction. So be careful when you use the UPDATE statement. Use a WHERE clause to pinpoint the rows you want to change, and test it first, as we will soon see.

Note that the previous UPDATE statement included the name of the database, because in previous chapters we set the mysql client to use rookery as the default database. Because all of the examples in this chapter will use the birdwatchers database, let’s change the default database to it with USE:

USE birdwatchers;

For the remainder of the examples in this chapter, you should download the rookery and the birdwatchers databases from the MySQL Resources site. They will provide you larger tables on which to work.

Updating Specific Rows

Most of the time, when you use the UPDATE statement you will need to include the WHERE clause to stipulate which rows are updated by the values in the SET clause. The conditions of a WHERE clause in an UPDATE statement are the same as that of a SELECT statement. In fact, because they’re the same, you can use the SELECT statement to test the conditions of the WHERE clause before using it in the UPDATE statement. We’ll see examples of that soon in this chapter. For now, let’s look at a simple method of conditionally updating a single row.

The humans table contains a row for a young woman named Rusty Osborne. She was married recently and wants to change her last name to her husband’s name, Johnson. We can do this with the UPDATE statement. First, let’s retrieve the record for her. We’ll select data based on her first and last name. There may be only one Rusty Osborne in the database, but there may be a few members with the family name of Osborne. So we would enter this in the mysql client:

SELECT human_id, name_first, name_last
FROM humans
WHERE name_first = 'Rusty'
AND name_last = 'Osborne';

+----------+------------+-----------+
| human_id | name_first | name_last |
+----------+------------+-----------+
|        3 | Rusty      | Osborne   |
+----------+------------+-----------+

Looking at the results, we can see that there is indeed only Rusty Osborne, and that the value of her human_id is 3. We’ll use that value in the UPDATE statement to be sure that we update only this one row. Let’s enter the following:

UPDATE humans
SET name_last = 'Johnson'
WHERE human_id = 3;

SELECT human_id, name_first, name_last
FROM humans
WHERE human_id = 3;

+----------+------------+-----------+
| human_id | name_first | name_last |
+----------+------------+-----------+
|        3 | Rusty      | Johnson   |
+----------+------------+-----------+

That worked just fine. It’s easy to use the UPDATE statement, especially when you know the identification number of the key column for the one row you want to change. Let’s suppose that two of our members who are married women have asked us to change their title from Mrs. to Ms. (this information is contained in an enumerated column called formal_title). After running a SELECT statement to find their records, we see that their human_id numbers are 24 and 32. We could then execute the following UPDATE statement in MySQL:

UPDATE humans
SET formal_title = 'Ms.'
WHERE human_id IN(24, 32);

Things get slightly more complicated when you want to change more than one row, but it’s still easy if you know the key values. In this example, we used the IN operator to list the human_id numbers to match specific rows in the table.

Suppose that after updating the title for the two women just shown, we decide that we want to make this change for all married women in the database, to get with the modern times. We would use the UPDATE statement again, but we’ll have to modify the WHERE clause. There may be too many women with the formal_title of Mrs. in the table to manually enter the human_id for all of them. Plus, there’s an easier way to do it. First, let’s see how the formal_title column looks now:

SHOW FULL COLUMNS
FROM humans
LIKE 'formal_title' \G


*************************** 1. row ***************************
     Field: formal_title
      Type: enum('Mr.','Miss','Mrs.','Ms.')
 Collation: latin1_bin
      Null: YES
       Key:
   Default: NULL
     Extra:
Privileges: select,insert,update,references
   Comment:

Looking at the enumerated values of this column, we decide that the choices seem somewhat sexist to us. We have one choice for boys and men, regardless of their age and marital status, and three choices for women. We also don’t have other genderless choices like Dr., but we decide to ignore those possibilities for now. In fact, we could eliminate the column so as not to be gender biased, but we decide to wait before making that decision. At this point, we want to change our schema so it limits the list of choices in the column to Mr. or Ms. however, we should not make that change to the schema until we fix all the existing values in the column. To do that, we’ll enter this UPDATE statement:

UPDATE humans
SET formal_title = 'Ms.'
WHERE formal_title IN('Miss','Mrs.');

Now that all of the members have either a value of Mr. or Ms. in the formal_title column, we can change the settings of that column to eliminate the other choices. We’ll use the ALTER TABLE statement covered in Chapter 4. Enter the following to change the table on your server:

ALTER TABLE humans
CHANGE COLUMN formal_title formal_title ENUM('Mr.','Ms.');

Query OK, 62 rows affected (0.13 sec)
Records: 62  Duplicates: 0  Warnings: 0

As you can see from the message in the results, the column change went well. However, if we had forgotten to change the data for one of the rows (e.g., didn’t change Miss to Ms. for one person), the Warnings would show a value of 1. In that case, you would then have to execute the SHOW WARNINGS statement to see this warning:

SHOW WARNINGS \G

*************************** 1. row ***************************
  Level: Warning
   Code: 1265
Message: Data truncated for column 'formal_title' at row 44

This tells us that MySQL eliminated the value for the formal_title column of the 44th row. We’d then have to use the UPDATE statement to try to set the formal_title for the person whose title was clobbered and hope we set the title correctly. That’s why it’s usually better to update the data before altering the table.

Sometimes, when changing bulk data, you have to alter the table before you can do the update. For example, suppose that we decide that we prefer to have the enumerated values of the formal_title set to Mr or Ms, without any periods. To do this, we would need to add that pair of choices to the ENUM column before we eliminate the old values. Then we can easily change the data to the new values. In this situation, we can tweak the criteria of the WHERE clause of the UPDATE statement. The values have a pattern: the new values are the same as the first two characters of the old value. So we can use a function to extract that part of the string. We would do something like this:

ALTER TABLE humans
CHANGE COLUMN formal_title formal_title ENUM('Mr.','Ms.','Mr','Ms');

UPDATE humans
SET formal_title = SUBSTRING(formal_title, 1, 2);

ALTER TABLE humans
CHANGE COLUMN formal_title formal_title ENUM('Mr','Ms');

The first ALTER TABLE statement adds the two new choices of titles without a period to the column, without yet eliminating the previous two choices because existing table contents use them. The final ALTER TABLE statement removes the two old choices of titles with a period from the column. Those two SQL statements are fine and not very interesting. The second one is more interesting, the UPDATE.

In the SET clause, we set the value of the formal_title column to a substring of its current value. We’re using the SUBSTRING() function to extract the text. Within the parentheses, we give the column from which to get a string (formal_title). Then we give the start of the substring we want to extract: 1, meaning the first character of the original string. We specify the number of characters we want to extract: 2. So wherever SUBSTRING() encounters “Mr.” it will extract “Mr”, and wherever it encounters “Ms.” it will extract “Ms”.

It’s critical to note that fuctions don’t change the data in the table. SUBSTRING() simply gives you back the substring. In order to actually change the column, you need the SET formal_title = clause. That changes formal_title to the value you got back from SUBSTRING(). Note that, if you wanted, you could just as easily have run SUBSTRING() on one column and used it to set the value of a different one.

In this chapter, we’ll work with a few string functions that are useful with the UPDATE statement. We’ll cover many more string functions in Chapter 10.

Limiting Updates

As mentioned near the beginning of this chapter, UPDATE can be a powerful tool for quickly changing large amounts of data in a MySQL database. As a result, you should almost always use a WHERE clause with an UPDATE statement to limit updates to rows based on certain conditions. There are times when you might also want to limit updates to a specific number of rows. To do this, use the LIMIT clause with the UPDATE statement. This clause functions the same as in the SELECT statement, but its purpose is different with UPDATE. Let’s look at an example of how and why you might use the LIMIT clause with the UPDATE statement.

Suppose that we decide to offer a small prize each month to two of the members of our site to encourage people to join. Maybe we’ll offer them the choice of a booklet with a list of birds found in their area, a nice pen with the Rookery name on it, or a water bottle with a bird image on it. Suppose also that we want a person to win only once, and we want to make sure that everyone wins eventually. To keep track of the winners, let’s create a table to record who won and when, as well as what prize they were sent and when. We’ll use the CREATE TABLE statement like so:

CREATE TABLE prize_winners
(winner_id INT AUTO_INCREMENT PRIMARY KEY,
 human_id INT,
 winner_date DATE,
 prize_chosen VARCHAR(255),
 prize_sent DATE);

In this statement, we created a table called prize_winners and gave it five columns: the first (winner_id) is a standard identifier for each row; the second (human_id) is to associate the rows in this table to the humans table; the third column (winner_date) is to record the date that the winner was determined; the next (prize_chosen) is the prize the member chose ultimately; and the last column (prize_sent) is to record the date the prize was sent to the winner.

Note

The IDs in this table may be a bit confusing. winner_id will be used to select items from this table, such as the prize and the dates. human_id will be used to find data about the winner in the humans table. You might think that there is no need for two IDs, as they both refer to the same person. But think back to the ways we used IDs to link birds, bird families, and bird orders. Giving each table its own identifier is more robust.

We could have set the prize_chosen column to an enumerated list of the choices, but the choices may change over time. We may eventually create another table containing a list of the many prizes and replace this column with a column that contains a reference number to a table listing prizes. For now, we’ll use a large variable character column.

Because we want to make sure every member wins eventually, we’ll enter a row in the prize_winners table for each member. Otherwise, we would enter a row only when the member won. This is probably the better choice for maintaining the data, but we’ll use the more straightforward method of inserting an entry for each member in the prize_winners table. We’ll use an INSERT...SELECT statement to select the winners and insert them in the new table (this type of SQL statement was covered in Inserting Data from Another Table):

INSERT INTO prize_winners
(human_id)
SELECT human_id
FROM humans;

This inserted a row in the prize_winners table for each member in the humans table. It added only the value of the human_id column, because that’s all we need at this point as no one has yet to win anything. The statement also automatically sets the winner_id column, thanks to its AUTO_INCREMENT modifier, giving it a unique value for each human. There is no reason this ID should be the same as the human_id column, because we’ll use the human_id column whenever we need information from it. The other columns currently have NULL for their values. We’ll update those values when someone wins a prize.

Now that we have a separate table for recording information about winners and their prizes each month, let’s pick some winners. We’ll do that in the next subsection.

Ordering to Make a Difference

In the previous subsection, we decided to award prizes to members so as to encourage new people to join the Rookery site, as well as to make current members feel good about continuing their membership. So that new and old members have an equal chance of winning, we’ll let MySQL randomly choose the winners each month. To do this, we’ll use the UPDATE statement with the ORDER BY clause and the RAND() function. This function picks an arbitrary floating-point number for each row found by the SQL statement in which it’s used. By putting this function in the ORDER BY clause, we will order the results based on the random values chosen for each row. If we couple that with the LIMIT clause, we can limit the results to a different pair of rows each month we select winners:

UPDATE prize_winners
SET winner_date = CURDATE()
WHERE winner_date IS NULL
ORDER BY RAND()
LIMIT 2;

There are flaws in the RAND() function. It’s not so random and can sometimes return the same results. So be careful about when you use it and for what purpose.

Let’s start at the bottom of this UPDATE statement. The ORDER BY clause is a bit ironic here because the order it puts the columns in is random. The LIMIT clause limits the results to only two rows. So everyone has an equal chance of being one of our two winners.

We can’t be sure that the top two rows are new winners, though; we might happen to choose the same person through a random process on different months. So we add a WHERE clause to update only rows in which winner_date has a value of NULL, which indicates that the member hasn’t won previously. Finally, at the top of the statement, we set the winner_date column for the winner to the current date, using a function we’ll learn about in Chapter 11.

However, there are some problems with this SQL statement that may not be obvious. First, the use of the RAND() function in an ORDER BY clause can be absurdly slow. You won’t notice the difference when used on a small table, but it performs poorly on an extremely large table that is used by a very active server. So, be mindful of which tables and situations you use the RAND() function within the ORDER BY clause. Second, using the ORDER BY clause with a LIMIT clause can cause problems if you use MySQL replication, unless you use row-based replication. This is a feature that allows you to have a master server and slave servers that replicate or copy exactly the databases on the master. That’s an advanced topic, but I want to mention this potential problem because when you use this combination of clauses with the UPDATE statement, you’ll see a warning message like this:

SHOW WARNINGS \G

*************************** 1. row ***************************
  Level: Warning
   Code: 1592
Message: Statement is not safe to log in statement format.

If you’re not using MySQL replication, you can ignore this warning. If you are using it, though, you’ll have a situation in which one slave may update its data differently from the data on the master or the other slaves—especially if you use the RAND() function (i.e., the slave will have different random results). Again, at this stage of learning MySQL, you can probably ignore this warning, and can safely use these clauses and this function. What’s important is that you’re aware of these potential problems and that you get of a sense of how extensive MySQL is.

Updating Multiple Tables

Thus far in this chapter, we have updated only one table at a time with the UPDATE statement. We’ve also made updates based on the values of the table for which the changes were made. You can also update values in one table based on values in another table. And it’s possible to update more than one table with one UPDATE statement. Let’s look at some examples of how and why you might do this.

Suppose that we’ve been giving out prizes for a couple of years now and that we’ve decided we want to make a special bid to recruit and retain members from the United Kingdom. To do this, we’ve decided to give four prizes each month to members of the Rookery site: two prizes to members in the U.K, and two prizes to members in all other countries. We’ll announce this change so that our skewing will be perceived fairly by members of the site. We’ll even allow U.K. members who won previously to win again. For this last component, we’ll need to reset the values of rows in the prize_winners table based on the value of the country_id in the humans table. Let’s see how that would look:

UPDATE prize_winners, humans
SET winner_date = NULL,
    prize_chosen = NULL,
    prize_sent = NULL
WHERE country_id = 'uk'
AND prize_winners.human_id = humans.human_id;

This SQL statement checks rows in one table, associates those rows to the related rows in another table, and changes those rows in that second table. Notice that we listed the two tables involved in a comma-separated list. We then used the SET clause to set the values of the columns related to winning a prize to NULL. In the WHERE clause, we give the condition that the country_id from the humans table has a value of uk and that the human_id in both tables equal.

Now that we’ve reset the prize information for the U.K. members, we’re ready to award prizes for the new month. Let’s try the UPDATE statement that we used previously to randomly select winners, but this time we’ll straddle both the humans and prize_winners tables by entering the following:

UPDATE prize_winners, humans
SET winner_date = CURDATE()
WHERE winner_date IS NULL
AND country_id = 'uk'
AND prize_winners.human_id = humans.human_id
ORDER BY RAND()
LIMIT 2;

ERROR 1221 (HY000): Incorrect usage of UPDATE and ORDER BY

You would expect this to work well, but it doesn’t work at all. Instead, it fails and returns the error message shown. When using the multiple-table syntax of UPDATE, it causes problems for MySQL if you include an ORDER BY or a LIMIT clause—those clauses apply to one table, not to multiple tables as in this UPDATE. Limitations like this can be frustrating, but there are ways around them. For our current task, because the ORDER BY RAND() and LIMIT clauses work with one table without problems, we can use a subquery (i.e., a query within a query) to randomly select the winners from the humans table and then update the prize_winners table. Let’s see how we would do that in this situation:

UPDATE prize_winners
SET winner_date = CURDATE()
WHERE winner_date IS NULL
AND human_id IN
  (SELECT human_id
   FROM humans
   WHERE country_id = 'uk'
   ORDER BY RAND())
LIMIT 2;

That may seem pretty complicated, but if we pull it apart, it’s not too difficult. First, let’s look at the inner query, the SELECT statement contained within the parentheses. It’s selecting the human_id for all members in the humans table, where the country_id has a value of uk, and randomly ordering the results. Notice that we’re selecting all rows for U.K. members and we’re not distinguishing whether the member was a previous winner. That’s because the inner query cannot query the table that is the target of the UPDATE. So we have to separate the conditions like we’re doing here: in the WHERE clause of the UPDATE, we’re updating only rows in which the value of the winner_date is NULL. That will be all of the U.K. members.But we could change the statement to select non-U.K. members simply by changing the operator in the subquery to !=.

In the UPDATE statement, using the IN operator, we specify that only rows whose human_id is in the results of the subquery should be updated. The LIMIT clause says to update only two rows. The LIMIT clause here is part of the UPDATE, not the subquery (i.e., the SELECT).

Because MySQL executes the subquery first, and separately from the UPDATE, there’s no problem with using the ORDER BY clause in it. Because the LIMIT clause is in an UPDATE that’s not using the multiple-table syntax, there’s no problem using it either.

The preceding example may seem cumbersome, but it solves the problem. When you can’t do something the way you would think in MySQL, you can sometimes accomplish a task with methods like using a subquery. Subqueries are covered extensively in Chapter 9.

Handling Duplicates

In Chapter 6, we covered the INSERT statement in detail. We saw several variants on its syntax and interesting ways to use it. This included INSERT...SELECT, a combination of the INSERT and SELECT statements. There is another combination related to updating rows, INSERT...ON DUPLICATE KEY UPDATE.

When inserting multiple rows of data, you may attempt inadvertently to insert rows that would be duplicates: that is to say, rows with the same value that is supposed to be unique. With the INSERT statement, you can add the IGNORE flag to indicate that duplicate rows should be ignored and not inserted. With the REPLACE statement, MySQL will replace the existing rows with the new data, or rather it will delete the existing rows and insert the new rows. As an alternative, you might want to keep the existing rows, but make a notation to them in each row. Such a situation is when INSERT...ON DUPLICATE KEY UPDATE is useful. This will make more sense with an example.

Suppose there is another bird-watchers website similar to ours that’s called Better Birders. Because that site has become inactive and the owner wants to close it, he contacts us and offers to redirect the site’s traffic to our domain if we’ll add its members to our membership. We accept this offer, so he gives us a plain-text file with a list of each member’s name and email address. There are a few ways we might import those names; some are covered in Chapter 15. But because some of the members of the other site may already be members of our site, we don’t want to import them and have duplicate entries. However, we do want to make note of those people as being members of the other site in case we want that information later. Let’s try using INSERT...ON DUPLICATE KEY UPDATE to do that. First we’ll add a column to indicate that a member came from the Better Birders site by using the ALTER TABLE statement like so:

ALTER TABLE humans
ADD COLUMN better_birders_site TINYINT DEFAULT 0;

This statement added a column named better_birders_site with a default value of 0. If someone is a member of the Better Birders site, we’ll set the column to 1. We’ll set the column to a value of 2 to indicate they are a member of both sites. Because two people can have the same name, we use the email address to determine whether a row is a duplicate. In the humans table, the email_address column is already set to UNIQUE. It will be the basis by which rows will be updated with the combined SQL statement we’ll use. With these factors in mind, let’s try to insert a few members:

INSERT INTO humans
(formal_title, name_first, name_last, email_address, better_birders_site)
VALUES('Mr','Barry','Pilson', 'barry@gomail.com', 1),
      ('Ms','Lexi','Hollar', 'alexandra@mysqlresources.com', 1),
      ('Mr','Ricky','Adams', 'ricky@gomail.com', 1)
ON DUPLICATE KEY
UPDATE better_birders_site = 2;

Because of the ON DUPLICATE KEY component, when there are rows with the same email address, the better_birders_site column will be set to 2. The rest will be inserted with their better_birders_site column set to 1. That’s what we wanted.

We now need to insert rows for these new members in the prize_winners table. We’ll use the INSERT...SELECT statement as we did earlier, but this time we’ll just insert rows where the value of the better_birders_site column is 1:

INSERT INTO prize_winners
(human_id)
SELECT human_id
FROM humans
WHERE better_birders_site = 1;

Although these two SQL statements worked well, it’s possible that there might be two entries for someone in the humans table if they used a different email address on the other site. That possibility may already exist with our existing members if they registered on the site more than once. Let’s check for this possibility and add a column to note it. We’ll enter the following SQL statements to prepare:

ALTER TABLE humans
ADD COLUMN possible_duplicate TINYINT DEFAULT 0;

CREATE TEMPORARY TABLE possible_duplicates
(name_1 varchar(25), name_2 varchar(25));

The first statement added a column to the humans table to note a row as a possible duplicate entry. The second creates a temporary table. A temporary table is accessible only to your MySQL client connection. When you exit from the client, the temporary table will be dropped automatically. Because we cannot update the same table for which we’re checking for duplicates, we can note them in this temporary table. We’ll use INSERT...SELECT to do this:

INSERT INTO possible_duplicates
SELECT name_first, name_last
FROM
  (SELECT name_first, name_last, COUNT(*) AS nbr_entries
   FROM humans
   GROUP BY name_first, name_last) AS derived_table
WHERE nbr_entries > 1;

This statement uses a subquery that selects the names and counts the number of entries based on the GROUP BY clause. We saw how to use GROUP BY and COUNT() together in Counting and Grouping Results, but their use here calls for a reiteration of how they work. The subquery selects name_first and name_last, and groups them so that any rows containing the same first and last names will be grouped together. They can then be counted. We give the result of COUNT(*) an alias of nbr_entries so that we can reference it elsewhere.

Back in the main SQL statement, the WHERE clause selects only rows from the subquery in which there are more than one entry (i.e., nbr_entries is greater than 1). These are duplicate entries. This SQL statement will insert a row into the temporary table for rows found in the humans table that have the same first and last name. It should enter only one row in the temporary table for each person.

Now that we have a list of possible duplicates in the temporary table, let’s update the humans table to note them:

UPDATE humans, possible_duplicates
SET possible_duplicate = 1
WHERE name_first = name_1
AND name_last = name_2;

That will set the value of the possible_duplicate column to 1 where the names in the humans table match the names in possible_duplicates. When we’re ready, we can send an email to these members telling them that we have two entries for their names and asking if the entries are duplicates. If they are, we might be able to merge the information together (such as by creating another column for a second email address) and delete the duplicate rows. As for the temporary table, it will be deleted when we close the MySQL client.

Deleting Data

With most databases, you will eventually need to delete rows from a table. To do this, you can use the DELETE statement. As mentioned a few times earlier in this book, there is no UNDELETE or UNDO statement for restoring rows that you delete. You can recover data from backups, if you’re making backups as you should, but it’s not quick and easy to restore data from them. If you use a storage engine like InnoDB, there is a method for wrapping SQL statements in a transaction that can be rolled back after you delete rows. However, once you commit such a transaction, you’ll have to look to backups or other cumbersome methods to restore deleted data. Thus, you should alwaysbe careful when using the DELETE statement.

The DELETE statement works much like the SELECT statement in that you may delete rows based on conditions in the WHERE clause. You should always use the WHERE clause, unless you really want to leave an empty table with no rows. You may also include an ORDER BY clause to specify the order in which rows are deleted, and a LIMIT clause to limit the number of rows deleted in a table. The basic syntax of the DELETE statement is:

DELETE FROM table
[WHERE condition]
[ORDER BY column]
[LIMIT row_count];

As the formatting indicates with square brackets, the WHERE, ORDER BY, and LIMIT clauses are optional. There are additional options that may be given and deviations to the syntax for deleting rows in multiple tables and for deletions based on multiple tables. Let’s look at an example using this simpler syntax for now.

Suppose after sending out a notice to members who we suspect of having duplicate entries in the humans table, one of them confirms that her membership has been duplicated. The member, Elena Bokova from Russia, asks us to delete the entry that uses her old yahoo.com email address. To do that, we could, but we won’t, enter this SQL statement:

DELETE FROM humans
WHERE name_first = 'Elena'
AND name_last = 'Bokova'
AND email_address LIKE '%yahoo.com';

This SQL statement will delete any rows in which the criteria expressed in the WHERE clause are met. Notice that for checking the email address, we used the LIKE operator and the wildcard (i.e., %) to match any email ending with yahoo.com.

The statement just shown would work fine, but we also need to delete the related entry in the prize_winners table. So we should first get the human_id for this row before deleting it. That’s why I said we won’t enter this SQL statement. It’s tedious, though, to execute one SQL statement to retrieve the human_id, then another to delete the row in the humans table, and then execute a third SQL statement to delete the related row in the prize_winners table. Instead, it would be better to change the DELETE statement to include both tables, deleting the desired rows from both in one SQL statement. We’ll cover that in the next subsection.

Deleting in Multiple Tables

There are many situations where data in one table is dependent on data in another table. If you use DELETE to delete a row in one table on which a row in another table is dependent, you’ll have orphaned data. You could execute another DELETE to remove that other row, but it’s usually better to delete rows in both tables in the same DELETE statement, especially when there may be many rows of data to delete.

The syntax for the DELETE that deletes rows in multiple tables is:

DELETE FROM table[, table]
USING table[,  . . . ]
[WHERE condition];

In the FROM clause, list the tables in a comma-separated list. The USING clause specifies how the tables are joined together (e.g., based on human_id). The WHERE clause is optional. Like the UPDATE statement, because this syntax includes multiple tables, the ORDER BY and LIMIT clauses are not permitted. This syntax can be tricky, but how much so may not be evident from looking at the syntax. Let’s look at an example.

In the example at the end of the previous subsection, we needed to delete rows from two tables that are related. We want to delete the rows for Elena Bokova in which she has a yahoo.com email address in both the humans and the prize_winners tables. To do that efficiently, we’ll enter this from the mysql client:

DELETE FROM humans, prize_winners
USING humans JOIN prize_winners
WHERE name_first = 'Elena'
AND name_last = 'Bokova'
AND email_address LIKE '%yahoo.com'
AND humans.human_id = prize_winners.human_id;

This DELETE statement is similar to other data manipulation statements (e.g., SELECT, UPDATE). However, there is a difference in the syntax that may be unexpected and confusing. The FROM clause lists the tables from which data is to be deleted. There is also a USING clause that lists the tables again and how they are joined. What is significant about this distinction is that we must list the tables in which rows are to be deleted in the FROM clause. If we did not include prize_winners in that list, no rows would be deleted from it—only rows from humans would be deleted.

There are several contortions and options in the syntax for DELETE. However, at this stage, the methods we reviewed in this chapter will serve well for almost all situations you will encounter as a MySQL and MariaDB developer or administrator.

Summary

The UPDATE and DELETE statements are very useful for changing data in tables; they are essential to managing a MySQL or MariaDB database. They have many possibilities for effecting changes to tables with ease. You can construct very complex SQL statements with them to change precisely the data you want to change or to delete exactly the rows you want to delete. However, it can be confusing and difficult at times. So be careful and learn these SQL statements well.

If you’re nervous at times about using the UPDATE and DELETE statements, it’s because you should be. You can change all of the rows in a table with one UPDATE statement, and you can delete all of the rows in a table with one DELETE statement. On a huge database, that could be thousands of rows of data changed or deleted in seconds. This is why good backups are always necessary. Whenever using these two SQL statements, take your time to be sure you’re right before you execute them. While you’re still learning especially, it can be a good idea to make a duplicate of a table with its data using the CREATE TABLE...SELECT statement before updating or deleting data. This SQL statement was covered in Essential Changes. This way if you make a major mistake, you can put the data back as it was before you started.

Because of the problems you can cause yourself and others who will use the databases on which you will work, practice using the UPDATE and DELETE statements. More than any other chapter in this book so far, you should make sure to complete the exercises in the next section.

Exercises

Exercises follow for you to practice using the UPDATE and DELETE statements. If you haven’t already, download the rookery and the birdwatchers databases from the MySQL Resources site). This will give you some good-sized tables on which to practice these SQL statements.

Use the CREATE TABLE...SELECT statement (see Essential Changes) to make a copies of the humans and the prize_winners tables. Name the new tables humans_copy and prize_winners_copy. Once you’ve created the copies, use the SELECT statement to view all of the rows in both of the new tables. You should see the same values as are contained in the original tables.
After you’ve done the previous exercise, use the SELECT statement to select all of the members from Austria in the humans table. You’ll need to use a WHERE clause for that SQL statement. The country_id for Austria is au. If you have problems, fix the SQL statement until you get it right.
Next, using the same WHERE clause from the SELECT statement, construct an UPDATE statement to change the value of the membership_type column for Austrian members to premium. In the same UPDATE statement, set the value of the membership_expiration to one year from the date you execute the SQL statement. You will need to use the CURDATE() function inside the DATE_ADD() function. The DATE_ADD() function was shown in an example earlier in this chapter (see Updating Specific Rows). The CURDATE() has no arguments to it, nothing to go inside its parentheses. Both functions are covered in Chapter 11. If you can’t figure out how to combine these function, you can enter the date manually (e.g., ‘2014-11-03’ for November 3, 2014; include the quote marks). Use the SELECT statement to check the results when you’re done.
Using the DELETE statement, delete the rows associated with the member named Barry Pilson from the humans and prize_winners tables. This was explained, along with an example showing how to do it, in Deleting in Multiple Tables. After you do this, use the SELECT statement to view all of the rows in both tables to make sure you deleted both rows.
Using the DELETE statement, delete all of the rows in the humans table. Then delete all of the rows of data in the prize_winners tables. Use the SELECT statement to confirm that both tables are empty.
Now copy all of the data from the humans_copy and prize_winners_copy tables to the humans and prize_winners tables. Do this with the INSERT...SELECT statement (covered in Inserting Data from Another Table).
After you’ve restored the data by this method, execute the SELECT statement again to confirm that both tables now have all of the data. If you were successful, use the DROP TABLE statement to eliminate the humans_copy and prize_winners_copy tables. This SQL statement was covered in Chapters 4 and 5. If you drop the wrong tables or if you delete data from the wrong tables, you can always download the whole database again from the MySQL Resources site.