Data in databases will change often. There’s always something to
change, some bit of information to add, some record to delete. For these
situations in which you want to change or add pieces of data, you will
mostly use the UPDATE
statement. For situations in which you
want to delete an entire row of data, you’ll primarily use the
DELETE
statement. Both of these SQL statements are covered
extensively in this chapter.
The UPDATE
statement changes the data in particular columns of existing records.
The basic syntax is the UPDATE
keyword followed by the table
name, then a SET
clause. Generally you add a
WHERE
clause so as not to update all of the data in a given
table. Here is a generic example of this SQL statement:
UPDATEtable
SETcolumn
=value
, ... ;
This syntax is similar to the emphatic version of the
INSERT
statement, which also uses the SET
clause. There isn’t a less emphatic syntax for UPDATE
, as
there is with INSERT
. An important distinction is that there
is no INTO
clause. Instead, the name of the affected table is
just given immediately after the UPDATE
keyword.
Let’s look at an example of the UPDATE
statement. In
Chapter 5, we created a database called
birdwatchers
and a table within it called humans
that would contain data about people who watch birds and use the
rookery
site. We then entered information on some of those
people. In one of the exercises at the end of Chapter 5, we added a column
(country_id
) which contains the country code where the member
resides. Suppose that of the few members that we’ve entered already in the
table, all of them live in the United States. We could set the default
value for the country_id
column to us,
but we’re expecting most of our members to be in a few countries of
Europe. For now, we just want to update all of the rows in the
humans
table to set the country_id
to
us. Execute an UPDATE
statement like
this:
UPDATE
birdwatchers
.
humans
SET
country_id
=
'us'
;
This statement will set the value for the country_id
for all of the rows in the table. All of them had a NULL value before
this, but if they had some other value—a different country code—those
values would be changed to us. That’s a very broad
and comprehensive action. Once you do this, there’s generally no way to
undo it—unless you do so in an InnoDB table and do it as part of a
transaction. So be careful when you use the UPDATE
statement.
Use a WHERE
clause to pinpoint the rows you want to change,
and test it first, as we will soon see.
Note that the previous UPDATE
statement included the
name of the database, because in previous chapters we set the mysql client to use rookery
as the
default database. Because all of the examples in this chapter will use the
birdwatchers
database, let’s change the default database to
it with USE
:
USE
birdwatchers
;
For the remainder of the examples in this chapter, you should
download the rookery
and the birdwatchers
databases from the MySQL
Resources site. They will provide you larger tables on which to
work.
Most of the time, when you use the UPDATE
statement you will need to
include the WHERE
clause to stipulate which rows are
updated by the values in the SET
clause. The conditions of
a WHERE
clause in an UPDATE
statement are the
same as that of a SELECT
statement. In fact, because
they’re the same, you can use the SELECT
statement to test
the conditions of the WHERE
clause before using it in the UPDATE
statement.
We’ll see examples of that soon in this chapter. For now, let’s look at
a simple method of conditionally updating a single row.
The humans
table contains a row for a young woman
named Rusty Osborne. She was married recently and
wants to change her last name to her husband’s name,
Johnson. We can do this with the
UPDATE
statement. First, let’s retrieve the record for her.
We’ll select data based on her first and last name. There may be only
one Rusty Osborne in the database, but there may be
a few members with the family name of Osborne. So
we would enter this in the mysql
client:
SELECT human_id, name_first, name_last
FROM humans
WHERE name_first = 'Rusty'
AND name_last = 'Osborne';
+----------+------------+-----------+ | human_id | name_first | name_last | +----------+------------+-----------+ | 3 | Rusty | Osborne | +----------+------------+-----------+
Looking at the results, we can see that there is indeed only Rusty
Osborne, and that the value of her human_id
is 3. We’ll use
that value in the UPDATE
statement to be sure that we
update only this one row. Let’s enter the following:
UPDATE humans
SET name_last = 'Johnson'
WHERE human_id = 3;
SELECT human_id, name_first, name_last
FROM humans
WHERE human_id = 3;
+----------+------------+-----------+
| human_id | name_first | name_last |
+----------+------------+-----------+
| 3 | Rusty | Johnson |
+----------+------------+-----------+
That worked just fine. It’s easy to use the UPDATE
statement, especially when you know the identification number of the key
column for the one row you want to change. Let’s suppose that two of our
members who are married women have asked us to change their title from
Mrs. to Ms. (this information
is contained in an enumerated column called formal_title
).
After running a SELECT
statement to find their records, we
see that their human_id
numbers are 24 and 32. We could
then execute the following UPDATE
statement in
MySQL:
UPDATE
humans
SET
formal_title
=
'Ms.'
WHERE
human_id
IN
(
24
,
32
);
Things get slightly more complicated when you want to change more
than one row, but it’s still easy if you know the key values. In this
example, we used the IN
operator to list the
human_id
numbers to match specific rows in the
table.
Suppose that after updating the title for the two women just
shown, we decide that we want to make this change for all married women
in the database, to get with the modern times. We would use the
UPDATE
statement again, but we’ll have to modify the
WHERE
clause. There may be too many women with the
formal_title
of Mrs. in the table to
manually enter the human_id
for all of them. Plus, there’s
an easier way to do it. First, let’s see how the
formal_title
column looks now:
SHOW FULL COLUMNS
FROM humans
LIKE 'formal_title' \G
*************************** 1. row ***************************
Field: formal_title
Type: enum('Mr.','Miss','Mrs.','Ms.')
Collation: latin1_bin
Null: YES
Key:
Default: NULL
Extra:
Privileges: select,insert,update,references
Comment:
Looking at the enumerated values of this column, we decide that
the choices seem somewhat sexist to us. We have one choice for boys and
men, regardless of their age and marital status, and three choices for
women. We also don’t have other genderless choices like
Dr., but we decide to ignore those possibilities
for now. In fact, we could eliminate the column so as not to be gender
biased, but we decide to wait before making that decision. At this
point, we want to change our schema so it limits the list of choices in
the column to Mr. or Ms.
however, we should not make that change to the schema until we fix all
the existing values in the column. To do that, we’ll enter this
UPDATE
statement:
UPDATE
humans
SET
formal_title
=
'Ms.'
WHERE
formal_title
IN
(
'Miss'
,
'Mrs.'
);
Now that all of the members have either a value of
Mr. or Ms. in the
formal_title
column, we can change the settings of that
column to eliminate the other choices. We’ll use the ALTER TABLE
statement covered in
Chapter 4. Enter the following to change
the table on your server:
ALTER TABLE humans
CHANGE COLUMN formal_title formal_title ENUM('Mr.','Ms.');
Query OK, 62 rows affected (0.13 sec)
Records: 62 Duplicates: 0 Warnings: 0
As you can see from the message in the results, the column change
went well. However, if we had forgotten to change the data for one of
the rows (e.g., didn’t change Miss to
Ms. for one person), the Warnings would show a
value of 1. In that case, you would then have to execute the SHOW
WARNINGS
statement to see this warning:
SHOW WARNINGS \G
*************************** 1. row ***************************
Level: Warning
Code: 1265
Message: Data truncated for column 'formal_title' at row 44
This tells us that MySQL eliminated the value for the
formal_title
column of the 44th row. We’d then have to use
the UPDATE
statement to try to set the
formal_title
for the person whose title was clobbered and
hope we set the title correctly. That’s why it’s usually better to
update the data before altering the table.
Sometimes, when changing bulk data, you have to alter the table
before you can do the update. For example, suppose that we decide that
we prefer to have the enumerated values of the formal_title
set to Mr or Ms, without any
periods. To do this, we would need to add that pair of choices to the
ENUM
column before we eliminate the old values. Then we can
easily change the data to the new values. In this situation, we can
tweak the criteria of the WHERE
clause of the
UPDATE
statement. The values have a pattern: the new values
are the same as the first two characters of the old value. So we can use
a function to extract that part of the string. We would do something
like this:
ALTER
TABLE
humans
CHANGE
COLUMN
formal_title
formal_title
ENUM
(
'Mr.'
,
'Ms.'
,
'Mr'
,
'Ms'
);
UPDATE
humans
SET
formal_title
=
SUBSTRING
(
formal_title
,
1
,
2
);
ALTER
TABLE
humans
CHANGE
COLUMN
formal_title
formal_title
ENUM
(
'Mr'
,
'Ms'
);
The first ALTER TABLE
statement adds the two new
choices of titles without a period to the column, without yet
eliminating the previous two choices because existing table contents use
them. The final ALTER TABLE
statement removes the two old
choices of titles with a period from the column. Those two SQL
statements are fine and not very interesting. The second one is more
interesting, the UPDATE
.
In the SET
clause, we set the value of the
formal_title
column to a substring of its current value.
We’re using the SUBSTRING()
function to extract the
text. Within the parentheses, we give the column from which to get a
string (formal_title
). Then we give the start of the
substring we want to extract: 1, meaning the first character of the
original string. We specify the number of characters we want to extract:
2. So wherever SUBSTRING()
encounters “Mr.” it will
extract “Mr”, and wherever it encounters “Ms.” it will extract
“Ms”.
It’s critical to note that fuctions don’t
change the data in the table.
SUBSTRING()
simply gives you back the substring. In order
to actually change the column, you need the SET formal_title
=
clause. That changes formal_title
to the value you
got back from SUBSTRING()
. Note that, if you wanted, you
could just as easily have run SUBSTRING()
on one column
and used it to set the value of a different one.
In this chapter, we’ll work with a few string functions that are
useful with the UPDATE
statement. We’ll cover many more
string functions in Chapter 10.
As mentioned near the beginning of this chapter, UPDATE
can be a powerful tool for quickly changing large amounts of data in a
MySQL database. As a result, you should almost always use a
WHERE
clause with an UPDATE
statement to limit
updates to rows based on certain conditions. There are times when you
might also want to limit updates to a specific number of rows. To do
this, use the LIMIT
clause with the UPDATE
statement. This clause functions the same as in the SELECT
statement, but its purpose is different with UPDATE
. Let’s
look at an example of how and why you might use the LIMIT
clause with the UPDATE
statement.
Suppose that we decide to offer a small prize each month to two of
the members of our site to encourage people to join. Maybe we’ll offer
them the choice of a booklet with a list of birds found in their area, a
nice pen with the Rookery name on it, or a water bottle with a bird
image on it. Suppose also that we want a person to win only once, and we
want to make sure that everyone wins eventually. To keep track of the
winners, let’s create a table to record who won and when, as well as
what prize they were sent and when. We’ll use the CREATE TABLE
statement like so:
CREATE
TABLE
prize_winners
(
winner_id
INT
AUTO_INCREMENT
PRIMARY
KEY
,
human_id
INT
,
winner_date
DATE
,
prize_chosen
VARCHAR
(
255
),
prize_sent
DATE
);
In this statement, we created a table called
prize_winners
and gave it five columns: the first
(winner_id
) is a standard identifier for each row; the
second (human_id
) is to associate the rows in this table to
the humans
table; the third column
(winner_date
) is to record the date that the winner was
determined; the next (prize_chosen
) is the prize the member
chose ultimately; and the last column (prize_sent
) is to
record the date the prize was sent to the winner.
The IDs in this table may be a bit confusing.
winner_id
will be used to select items from this table,
such as the prize and the dates. human_id
will be used to
find data about the winner in the humans
table. You
might think that there is no need for two IDs, as they both refer to
the same person. But think back to the ways we used IDs to link birds,
bird families, and bird orders. Giving each table its own identifier
is more robust.
We could have set the prize_chosen
column to an
enumerated list of the choices, but the choices may change over time. We
may eventually create another table containing a list of the many prizes
and replace this column with a column that contains a reference number
to a table listing prizes. For now, we’ll use a large variable character
column.
Because we want to make sure every member wins eventually, we’ll
enter a row in the prize_winners
table for each member.
Otherwise, we would enter a row only when the member won. This is
probably the better choice for maintaining the data, but we’ll use the
more straightforward method of inserting an entry for each member in the
prize_winners
table. We’ll use an
INSERT...SELECT
statement to select the winners and insert them in the new table
(this type of SQL statement was covered in Inserting Data from Another Table):
INSERT
INTO
prize_winners
(
human_id
)
SELECT
human_id
FROM
humans
;
This inserted a row in the prize_winners
table for
each member in the humans
table. It added only the value of
the human_id
column, because that’s all we need at this
point as no one has yet to win anything. The statement also
automatically sets the winner_id
column, thanks to its
AUTO_INCREMENT
modifier, giving it a unique value for each
human. There is no reason this ID should be the same as the
human_id
column, because we’ll use the
human_id
column whenever we need information from it. The
other columns currently have NULL for their values. We’ll update those
values when someone wins a prize.
Now that we have a separate table for recording information about winners and their prizes each month, let’s pick some winners. We’ll do that in the next subsection.
In the previous subsection, we decided to award prizes to members so as to encourage
new people to join the Rookery site, as well as to make current members
feel good about continuing their membership. So that new and old members
have an equal chance of winning, we’ll let MySQL randomly choose the
winners each month. To do this, we’ll use the UPDATE
statement with the ORDER BY
clause and the
RAND()
function. This function picks an arbitrary
floating-point number for each row found by the SQL statement in which
it’s used. By putting this function in the ORDER BY
clause,
we will order the results based on the random values chosen for each
row. If we couple that with the LIMIT
clause, we can limit
the results to a different pair of rows each month we select
winners:
UPDATE
prize_winners
SET
winner_date
=
CURDATE
()
WHERE
winner_date
IS
NULL
ORDER
BY
RAND
()
LIMIT
2
;
There are flaws in the RAND()
function. It’s not so
random and can sometimes return the same results. So be careful about
when you use it and for what purpose.
Let’s start at the bottom of this UPDATE
statement.
The ORDER BY
clause is a bit ironic here because the order
it puts the columns in is random. The LIMIT
clause limits
the results to only two rows. So everyone has an equal chance of being
one of our two winners.
We can’t be sure that the top two rows are new winners, though; we
might happen to choose the same person through a random process on
different months. So we add a WHERE
clause to update only rows in
which winner_date
has a value of NULL, which indicates that
the member hasn’t won previously. Finally, at the top of the statement,
we set the winner_date
column for the winner to the current
date, using a function we’ll learn about in Chapter 11.
However, there are some problems with this SQL statement that may
not be obvious. First, the use of the RAND()
function in
an ORDER BY
clause can be absurdly slow. You won’t notice
the difference when used on a small table, but it performs poorly on an
extremely large table that is used by a very active server. So, be
mindful of which tables and situations you use the RAND()
function within the ORDER BY
clause. Second, using the
ORDER BY
clause with a LIMIT
clause can cause
problems if you use MySQL replication, unless you use row-based
replication. This is a feature that allows you to have a master server
and slave servers that replicate or copy exactly the databases on the
master. That’s an advanced topic, but I want to mention this potential
problem because when you use this combination of clauses with the
UPDATE
statement, you’ll see a warning message like
this:
SHOW WARNINGS \G
*************************** 1. row ***************************
Level: Warning
Code: 1592
Message: Statement is not safe to log in statement format.
If you’re not using MySQL replication, you can ignore this
warning. If you are using it, though, you’ll have a situation in which
one slave may update its data differently from the data on the master or
the other slaves—especially if you use the RAND()
function
(i.e., the slave will have different random results). Again, at this
stage of learning MySQL, you can probably ignore this warning, and can
safely use these clauses and this function. What’s important is that
you’re aware of these potential problems and that you get of a
sense of how extensive MySQL is.
Thus far in this chapter, we have updated only one table at a time with the
UPDATE
statement. We’ve also made updates based on the
values of the table for which the changes were made. You can also update
values in one table based on values in another table. And it’s possible
to update more than one table with one UPDATE
statement.
Let’s look at some examples of how and why you might do this.
Suppose that we’ve been giving out prizes for a couple of years
now and that we’ve decided we want to make a special bid to recruit and
retain members from the United Kingdom. To do this, we’ve decided to
give four prizes each month to members of the Rookery site: two prizes
to members in the U.K, and two prizes to members in all other countries.
We’ll announce this change so that our skewing will be perceived fairly
by members of the site. We’ll even allow U.K. members who won previously
to win again. For this last component, we’ll need to reset the values of
rows in the prize_winners
table based on the value of the
country_id
in the humans
table. Let’s see how
that would look:
UPDATE
prize_winners
,
humans
SET
winner_date
=
NULL
,
prize_chosen
=
NULL
,
prize_sent
=
NULL
WHERE
country_id
=
'uk'
AND
prize_winners
.
human_id
=
humans
.
human_id
;
This SQL statement checks rows in one table, associates those rows
to the related rows in another table, and changes those rows in that
second table. Notice that we listed the two tables involved in a
comma-separated list. We then used the SET
clause to set the values of the
columns related to winning a prize to NULL. In the WHERE
clause, we give the condition that
the country_id
from the humans
table has a
value of uk and that the human_id
in
both tables equal.
Now that we’ve reset the prize information for the U.K. members,
we’re ready to award prizes for the new month. Let’s try the
UPDATE
statement that we used previously to randomly select
winners, but this time we’ll straddle both the humans
and
prize_winners
tables by entering the following:
UPDATE prize_winners, humans
SET winner_date = CURDATE()
WHERE winner_date IS NULL
AND country_id = 'uk'
AND prize_winners.human_id = humans.human_id
ORDER BY RAND()
LIMIT 2;
ERROR 1221 (HY000): Incorrect usage of UPDATE and ORDER BY
You would expect this to work well, but it doesn’t work at all.
Instead, it fails and returns the error message shown. When using the
multiple-table syntax of UPDATE
, it causes problems for
MySQL if you include an ORDER BY
or a LIMIT
clause—those clauses apply to one table, not to multiple tables as in
this UPDATE
. Limitations like this can be frustrating, but
there are ways around them. For our current task, because the
ORDER BY RAND()
and LIMIT
clauses work with
one table without problems, we can use a subquery (i.e., a query within
a query) to randomly select the winners from the humans
table and then update the prize_winners
table. Let’s see
how we would do that in this situation:
UPDATE
prize_winners
SET
winner_date
=
CURDATE
()
WHERE
winner_date
IS
NULL
AND
human_id
IN
(
SELECT
human_id
FROM
humans
WHERE
country_id
=
'uk'
ORDER
BY
RAND
())
LIMIT
2
;
That may seem pretty complicated, but if we pull it apart, it’s
not too difficult. First, let’s look at the inner query, the
SELECT
statement contained within the parentheses. It’s
selecting the human_id
for all members in the
humans
table, where the country_id
has a value
of uk, and randomly ordering the results. Notice
that we’re selecting all rows for U.K. members and we’re not
distinguishing whether the member was a previous winner. That’s because
the inner query cannot query the table that is the target of the
UPDATE
. So we have to separate the conditions like we’re
doing here: in the WHERE
clause of the UPDATE
,
we’re updating only rows in which the value of the
winner_date
is NULL. That will be all of the U.K.
members.But we could change the statement to select non-U.K. members
simply by changing the operator in the subquery to
!=
.
In the UPDATE
statement, using the IN
operator, we specify that only rows whose human_id
is in
the results of the subquery should be updated. The LIMIT
clause says to update only two rows. The LIMIT
clause here
is part of the UPDATE
, not the subquery (i.e., the
SELECT
).
Because MySQL executes the subquery first, and separately from the
UPDATE
, there’s no problem with using the ORDER
BY
clause in it. Because the LIMIT
clause is in an
UPDATE
that’s
not using the multiple-table syntax, there’s no problem using it
either.
The preceding example may seem cumbersome, but it solves the problem. When you can’t do something the way you would think in MySQL, you can sometimes accomplish a task with methods like using a subquery. Subqueries are covered extensively in Chapter 9.
In Chapter 6, we covered the
INSERT
statement in detail. We saw several variants on its syntax and
interesting ways to use it. This included INSERT...SELECT
,
a combination of the INSERT
and SELECT
statements. There is another combination related to updating rows, INSERT...ON
DUPLICATE KEY UPDATE
.
When inserting multiple rows of data, you may attempt
inadvertently to insert rows that would be duplicates: that is to say,
rows with the same value that is supposed to be unique. With the
INSERT
statement, you can add the IGNORE
flag to indicate that
duplicate rows should be ignored and not inserted. With the
REPLACE
statement, MySQL will replace the existing rows
with the new data, or rather it will delete the existing rows and insert
the new rows. As an alternative, you might want to keep the existing
rows, but make a notation to them in each row. Such a situation is when
INSERT...ON DUPLICATE KEY UPDATE
is useful. This will make
more sense with an example.
Suppose there is another bird-watchers website similar to ours
that’s called Better Birders. Because that site has
become inactive and the owner wants to close it, he contacts us and
offers to redirect the site’s traffic to our domain if we’ll add its
members to our membership. We accept this offer, so he gives us a
plain-text file with a list of each member’s name and email address.
There are a few ways we might import those names; some are covered in
Chapter 15. But because some of the members
of the other site may already be members of our site, we don’t want to
import them and have duplicate entries. However, we do want to make note
of those people as being members of the other site in case we want that
information later. Let’s try using INSERT...ON DUPLICATE KEY
UPDATE
to do that. First we’ll add a column to indicate that a
member came from the Better Birders site by using
the ALTER TABLE
statement like so:
ALTER
TABLE
humans
ADD
COLUMN
better_birders_site
TINYINT
DEFAULT
0
;
This statement added a column named
better_birders_site
with a default value of 0. If someone
is a member of the Better Birders site, we’ll set
the column to 1. We’ll set the column to a value of 2 to indicate they
are a member of both sites. Because two people can have the same name,
we use the email address to determine whether a row is a duplicate. In
the humans
table, the email_address
column is
already set to UNIQUE
. It will be the basis by which rows
will be updated with the combined SQL statement we’ll use. With these
factors in mind, let’s try to insert a few members:
INSERT
INTO
humans
(
formal_title
,
name_first
,
name_last
,
email_address
,
better_birders_site
)
VALUES
(
'Mr'
,
'Barry'
,
'Pilson'
,
'barry@gomail.com'
,
1
),
(
'Ms'
,
'Lexi'
,
'Hollar'
,
'alexandra@mysqlresources.com'
,
1
),
(
'Mr'
,
'Ricky'
,
'Adams'
,
'ricky@gomail.com'
,
1
)
ON
DUPLICATE
KEY
UPDATE
better_birders_site
=
2
;
Because of the ON DUPLICATE KEY
component, when there
are rows with the same email address, the
better_birders_site
column will be set to 2. The rest will
be inserted with their better_birders_site
column set to 1.
That’s what we wanted.
We now need to insert rows for these new members in the
prize_winners
table. We’ll use the
INSERT...SELECT
statement as we did earlier, but this time we’ll just insert rows where
the value of the better_birders_site
column is 1:
INSERT
INTO
prize_winners
(
human_id
)
SELECT
human_id
FROM
humans
WHERE
better_birders_site
=
1
;
Although these two SQL statements worked well, it’s possible that
there might be two entries for someone in the humans
table
if they used a different email address on the other site. That
possibility may already exist with our existing members if they
registered on the site more than once. Let’s check for this possibility
and add a column to note it. We’ll enter the following SQL statements to
prepare:
ALTER
TABLE
humans
ADD
COLUMN
possible_duplicate
TINYINT
DEFAULT
0
;
CREATE
TEMPORARY
TABLE
possible_duplicates
(
name_1
varchar
(
25
),
name_2
varchar
(
25
));
The first statement added a column to the humans
table to note a row as a possible duplicate entry. The second creates a
temporary table. A temporary table is accessible only to your MySQL
client connection. When you exit from the client, the temporary table
will be dropped automatically. Because we cannot update the same table
for which we’re checking for duplicates, we can note them in this
temporary table. We’ll use INSERT...SELECT
to do
this:
INSERT
INTO
possible_duplicates
SELECT
name_first
,
name_last
FROM
(
SELECT
name_first
,
name_last
,
COUNT
(
*
)
AS
nbr_entries
FROM
humans
GROUP
BY
name_first
,
name_last
)
AS
derived_table
WHERE
nbr_entries
>
1
;
This statement uses a subquery that selects the names and counts
the number of entries based on the GROUP BY
clause. We saw
how to use GROUP BY
and COUNT()
together in Counting and Grouping Results, but their use
here calls for a reiteration of how they work. The subquery selects
name_first
and name_last
, and groups them so
that any rows containing the same first and last names will be grouped
together. They can then be counted. We give the result of
COUNT(*)
an alias of nbr_entries
so that we
can reference it elsewhere.
Back in the main SQL statement, the WHERE
clause
selects only rows from the subquery in which there are more than one
entry (i.e., nbr_entries
is greater than 1). These are
duplicate entries. This SQL statement will insert a row into the
temporary table for rows found in the humans
table that
have the same first and last name. It should enter only one row in the
temporary table for each person.
Now that we have a list of possible duplicates in the temporary
table, let’s update the humans
table to note them:
UPDATE
humans
,
possible_duplicates
SET
possible_duplicate
=
1
WHERE
name_first
=
name_1
AND
name_last
=
name_2
;
That will set the value of the possible_duplicate
column to 1 where the names in the humans
table match the
names in possible_duplicates
. When we’re ready, we can send
an email to these members telling them that we have two entries for
their names and asking if the entries are duplicates. If they are, we
might be able to merge the information together (such as by creating
another column for a second email address) and delete the duplicate
rows. As for the temporary table, it will be deleted when we close the
MySQL client.
With most databases, you will eventually need to delete rows from a table. To do this, you
can use the DELETE
statement. As mentioned a few times
earlier in this book, there is no UNDELETE
or
UNDO
statement for restoring rows that you delete. You can
recover data from backups, if you’re making backups as you should, but
it’s not quick and easy to restore data from them. If you use a storage
engine like InnoDB, there is a method for wrapping SQL statements in a
transaction that can be rolled back after you delete rows. However, once
you commit such a transaction, you’ll have to look to backups or other
cumbersome methods to restore deleted data. Thus, you should alwaysbe
careful when using the DELETE
statement.
The DELETE
statement works much like the
SELECT
statement in that you may delete rows based on
conditions in the WHERE
clause. You should always use the WHERE
clause, unless
you really want to leave an empty table with no rows. You may also include
an ORDER BY
clause to specify the order in which rows are
deleted, and a LIMIT
clause to limit the number of rows deleted in a table. The basic
syntax of the DELETE
statement is:
DELETE FROMtable
[WHEREcondition
] [ORDER BYcolumn
] [LIMIT row_count];
As the formatting indicates with square brackets, the
WHERE
, ORDER BY
, and LIMIT
clauses
are optional. There are additional options that may be given and
deviations to the syntax for deleting rows in multiple tables and for
deletions based on multiple tables. Let’s look at an example using this
simpler syntax for now.
Suppose after sending out a notice to members who we suspect of
having duplicate entries in the humans
table, one of them
confirms that her membership has been duplicated. The member,
Elena Bokova from Russia, asks us to delete the entry
that uses her old yahoo.com email address. To do
that, we could, but we won’t, enter this SQL statement:
DELETE
FROM
humans
WHERE
name_first
=
'Elena'
AND
name_last
=
'Bokova'
AND
email_address
LIKE
'%yahoo.com'
;
This SQL statement will delete any rows in which the criteria
expressed in the WHERE
clause are met. Notice that for
checking the email address, we used the LIKE
operator and the
wildcard (i.e., %
) to match any email ending with
yahoo.com.
The statement just shown would work fine, but we also need to delete
the related entry in the prize_winners
table. So we should
first get the human_id
for this row before deleting it.
That’s why I said we won’t enter this SQL statement. It’s tedious, though,
to execute one SQL statement to retrieve the human_id
, then
another to delete the row in the humans
table, and then
execute a third SQL statement to delete the related row in the
prize_winners
table. Instead, it would be better to change
the DELETE
statement to include both tables, deleting the
desired rows from both in one SQL statement. We’ll cover that in the next
subsection.
There are many situations where data in one table is dependent on
data in another table. If you use DELETE
to delete a row in
one table on which a row in another table is dependent, you’ll have
orphaned data. You could execute another DELETE
to remove
that other row, but it’s usually better to delete rows in both tables in
the same DELETE
statement, especially when there may be
many rows of data to delete.
The syntax for the DELETE
that deletes rows in
multiple tables is:
DELETE FROMtable
[,table
] USINGtable
[, . . . ] [WHEREcondition
];
In the FROM
clause, list the tables in a
comma-separated list. The USING
clause specifies how the tables are joined together (e.g., based
on human_id
). The WHERE
clause is optional. Like the UPDATE
statement,
because this syntax includes multiple tables, the ORDER BY
and LIMIT
clauses are not permitted. This syntax can be
tricky, but how much so may not be evident from looking at the syntax.
Let’s look at an example.
In the example at the end of the previous subsection, we needed to
delete rows from two tables that are related. We want to delete the rows
for Elena Bokova in which she has a
yahoo.com email address in both the
humans
and the prize_winners
tables. To do
that efficiently, we’ll enter this from the mysql client:
DELETE
FROM
humans
,
prize_winners
USING
humans
JOIN
prize_winners
WHERE
name_first
=
'Elena'
AND
name_last
=
'Bokova'
AND
email_address
LIKE
'%yahoo.com'
AND
humans
.
human_id
=
prize_winners
.
human_id
;
This DELETE
statement is similar to other data
manipulation statements (e.g., SELECT
,
UPDATE
). However, there is a difference in the syntax that
may be unexpected and confusing. The FROM
clause lists the
tables from which data is to be deleted. There is also a
USING
clause that lists the tables again and how they are
joined. What is significant about this distinction is that we must list
the tables in which rows are to be deleted in the FROM
clause. If we did not include prize_winners
in that list,
no rows would be deleted from it—only rows from humans
would be deleted.
There are several contortions and options in the syntax for
DELETE
. However, at this stage, the methods we reviewed in
this chapter will serve well for almost all situations you will
encounter as a MySQL and MariaDB developer or administrator.
The UPDATE
and DELETE
statements are very
useful for changing data in tables; they are essential to managing a MySQL
or MariaDB database. They have many possibilities for effecting changes to
tables with ease. You can construct very complex SQL statements with them
to change precisely the data you want to change or to delete exactly the
rows you want to delete. However, it can be confusing and difficult at
times. So be careful and learn these SQL statements well.
If you’re nervous at times about using the UPDATE
and
DELETE
statements, it’s because you should be. You can change
all of the rows in a table with one UPDATE
statement, and you
can delete all of the rows in a table with one DELETE
statement. On a huge database, that could be thousands of rows of data
changed or deleted in seconds. This is why good backups are always
necessary. Whenever using these two SQL statements, take your time to be
sure you’re right before you execute them. While you’re still learning
especially, it can be a good idea to make a duplicate of a table with its
data using the CREATE TABLE...SELECT
statement before
updating or deleting data. This SQL statement was covered in Essential Changes. This way if you make a major
mistake, you can put the data back as it was before you started.
Because of the problems you can cause yourself and others who will
use the databases on which you will work, practice using the
UPDATE
and DELETE
statements. More than any
other chapter in this book so far, you should make sure to complete the
exercises in the next section.
Exercises follow for you to practice using the UPDATE
and DELETE
statements. If you haven’t already, download the
rookery
and the birdwatchers
databases from the
MySQL Resources
site). This will give you some good-sized tables on which to
practice these SQL statements.
Use the CREATE TABLE...SELECT
statement (see Essential Changes) to make a copies of the
humans
and the prize_winners
tables. Name
the new tables humans_copy
and
prize_winners_copy
. Once you’ve created the copies, use
the SELECT
statement to view all of the rows in both of
the new tables. You should see the same values as are contained in the
original tables.
After you’ve done the previous exercise, use the
SELECT
statement to select all of the members from
Austria in the humans
table. You’ll need to use a
WHERE
clause for that SQL statement. The
country_id
for Austria is au. If you
have problems, fix the SQL statement until you get it right.
Next, using the same WHERE
clause from the
SELECT
statement, construct an UPDATE
statement to
change the value of the membership_type
column for
Austrian members to premium. In the same
UPDATE
statement, set the value of the
membership_expiration
to one year from the date you
execute the SQL statement. You will need to use the
CURDATE()
function inside the DATE_ADD()
function. The DATE_ADD()
function was shown in an
example earlier in this chapter (see Updating Specific Rows). The
CURDATE()
has no arguments to it, nothing to go inside
its parentheses. Both functions are covered in Chapter 11. If you can’t figure out how to
combine these function, you can enter the date manually (e.g.,
‘2014-11-03’ for November 3, 2014; include the
quote marks). Use the SELECT
statement to check the
results when you’re done.
Using the DELETE
statement, delete the rows
associated with the member named Barry Pilson
from the humans
and prize_winners
tables.
This was explained, along with an example showing how to do it, in
Deleting in Multiple Tables. After you do this,
use the SELECT
statement to view all of the rows in both
tables to make sure you deleted both rows.
Using the DELETE
statement, delete all of the rows
in the humans
table. Then delete all of the rows of data
in the prize_winners
tables. Use the SELECT
statement to confirm that both tables are empty.
Now copy all of the data from the humans_copy
and
prize_winners_copy
tables to the humans
and
prize_winners
tables. Do this with the
INSERT...SELECT
statement (covered in Inserting Data from Another Table).
After you’ve restored the data by this method, execute the
SELECT
statement again to confirm that both tables now
have all of the data. If you were successful, use the DROP
TABLE
statement to eliminate the humans_copy
and
prize_winners_copy
tables. This SQL statement was covered
in Chapters 4 and 5. If
you drop the wrong tables or if you delete data from the wrong tables,
you can always download the whole database again from the MySQL
Resources site.