A string is a value that can contain alphabetical characters, digits, and other characters (e.g., the ampersand, the dollar sign). Although a string can contain numbers, they are not considered numeric values. It’s a matter of context and perspective. For instance, postal codes in the United States are all digits, but you shouldn’t store them as integers because the postal code for 02138 would become 2138. You should use a string to store the postal code.
To make the handling of strings easier, MySQL provides many built-in
functions. You can format text for nicer results, make better expressions in
a WHERE
clause, or otherwise extract and manipulate data from a
string or column. Therefore, in this chapter, we’ll go through several
string functions, grouping them by similar features, and provide examples of
how they might be used.
Several string functions can format or reconstitute text for a better display. They allow you to store data in columns in a raw form or in separate components and then create the display you want when you retrieve the data.
For instance, in the humans
table, we are able to store
each member’s title, first name, and last name in separate columns because
we can put them together when needed. Breaking apart the names allows us
to sort easily based on last name or first name. You’ll see how this is
done in the next subsection.
The CONCAT()
function is very useful for pasting together the contents of different
columns, or adding some other text to the results retrieved from a
column. This is probably the most used string function—we’ve already
used it in several examples in previous chapters. Within the parentheses
of the function, in a comma-separated list, you give the strings,
columns, and other elements that you want to merge together into one
string.
Let’s look at an example of how it might be used within a SELECT
statement. Suppose we want to get a
list of a few members and birds that they’ve seen. We could enter an SQL
statement like this:
SELECT CONCAT(formal_title, '. ', name_first, SPACE(1), name_last) AS Birder,
CONCAT(common_name, ' - ', birds.scientific_name) AS Bird,
time_seen AS 'When Spotted'
FROM birdwatchers.bird_sightings
JOIN birdwatchers.humans USING(human_id)
JOIN rookery.birds USING(bird_id)
GROUP BY human_id DESC
LIMIT 4;
+----------------------+----------------------------------+--------------------+
| Birder | Bird | When Spotted |
+----------------------+----------------------------------+--------------------+
| Ms. Marie Dyer | Red-billed Curassow - Crax blu...| 2013-10-02 07:39:44|
| Ms. Anahit Vanetsyan | Bar-tailed Godwit - Limosa lap...| 2013-10-01 05:40:00|
| Ms. Katerina Smirnova| Eurasian Curlew - Numenius arq...| 2013-10-01 07:06:46|
| Ms. Elena Bokova | Eskimo Curlew - Numenius borea...| 2013-10-01 05:09:27|
+----------------------+----------------------------------+--------------------+
The first field displayed by this SQL statement is not a single
column from the table, but a CONCAT()
function that merges
the bird-watcher’s title, first name, and last name. We added a period
in quotes after the title, as we’ve decided to store the titles without
a period. We used quote marks to add spaces where needed. For the second
field, we concatenated the common name of each bird species with the
scientific name, and put spaces and a hyphen between them.
Without CONCAT()
, we might be tempted to combine
text in one column that really should be separated. For instance, we
might put the common and scientific names of bird species in one column.
Keeping values in separate columns makes a database more efficient and
flexible. String functions like CONCAT()
alleviate the
need to do otherwise.
A less common concatenating function is CONCAT_WS()
. It puts together columns
with a separator between each. The first argument is the element you
want to use as a separator (e.g., a space) and the rest of the arguments
are the values to be separated. This can be useful when making data
available for other programs.
For instance, suppose we have embroidered patches made with the name of the Rookery site on them and we want to mail one to each premium member. To do this, we use an advertising and marketing agency that will handle the mailing. The agency needs the names and addresses of members, and would like that data in a text file, with the values of each field separated by vertical bars. To do this, we’ll run mysql on the command line, passing a single statement to it:
mysql
-
p
--
skip
-
column
-
names
-
e
\
"SELECT CONCAT_WS('|', formal_title, name_first, name_last,
street_address, city, state_province, postal_code, country_id)
FROM birdwatchers.humans WHERE membership_type = 'premium'
AND membership_expiration > CURDATE();"
>
rookery_patch_mailinglist
.
txt
This example uses mysql with
several options. The --skip-column-names
option tells MySQL
not to display the column headings—we want just the data separated by
bars. The -e
option says that what follows within quotes is
to be executed. We then put the SQL statement within double quotes. The
first argument to CONCAT_WS()
is the vertical bar that the
company wants as a separator. The remaining arguments are the columns to
be strung together. After the closing double quotes, we use
>
to redirect the results to a text file that we’ll
email to the agency. There is a potential problem with the SQL statement
we used. If a column has a NULL value, nothing will be exported and no
bar will be put in the file to indicate an empty field. Here’s an
example of how the text file would look:
Ms|Rusty|Osborne|ch Ms|Elena|Bokova|ru
We have only four fields for these members, although we told MySQL
to export eight fields. If these two records were in the midst of
thousands of records, they would cause errors that might not be obvious
when imported. Although it’s more cumbersome, we should wrap each column
name in an IFNULL()
function. Then we can give a value to
display if the column is NULL, such as the word
unknown or a blank space. Here’s the same example
again, but with the IFNULL()
function:
mysql
-
p
--
skip
-
column
-
names
-
e
\
"SELECT CONCAT_WS('|', IFNULL(formal_title, ' '), IFNULL(name_first, ' '),
IFNULL(name_last, ' '), IFNULL(street_address, ' '),
IFNULL(city, ' '), IFNULL(state_province, ' '),
IFNULL(postal_code, ' '), IFNULL(country_id, ' '))
FROM birdwatchers.humans WHERE membership_type = 'premium'
AND membership_expiration > CURDATE();"
>
rookery_patch_mailinglist
.
txt
It looks daunting and excessive, but it’s simple to MySQL. The new contents of the text file follow:
Ms|Rusty|Osborne| | | | |ch Ms|Elena|Bokova| | | | |ru
That’s a manageable data file. When the results are like this, the marketing company can import all of the records without errors and then contact us to try to get the missing information. They can add it to their system without having to reimport the text file.
Occasionally, you might want to convert the text from a column to either all lowercase letters or
all uppercase letters. For these situations, there are LOWER()
and UPPER()
, which
can also be spelled LCASE()
and UCASE()
,
respectively. In the example that follows, the output of the first
column is converted to lowercase and the second to uppercase:
SELECT LCASE(common_name) AS Species,
UCASE(bird_families.scientific_name) AS Family
FROM birds
JOIN bird_families USING(family_id)
WHERE common_name LIKE '%Wren%'
ORDER BY Species
LIMIT 5;
+-------------------------+---------------+
| Species | Family |
+-------------------------+---------------+
| apolinar's wren | TROGLODYTIDAE |
| band-backed wren | TROGLODYTIDAE |
| banded wren | TROGLODYTIDAE |
| bar-winged wood-wren | TROGLODYTIDAE |
| bar-winged wren-babbler | TIMALIIDAE |
+-------------------------+---------------+
The QUOTE()
function takes a string and returns it enclosed in single quotes. But it
does a good deal more: it makes it input-safe by marking certain
characters that could cause trouble in SQL statements or other
programming languages. These characters are single quotes, backslashes,
null (zero) bytes, and Ctrl-Z characters. The QUOTE()
function precedes each of these with a backslash so that they won’t be
interpreted in some way or (in the case of a single quote) cause SQL to
prematurely terminate the string.
In the following example, we’re selecting a list of bird species named for a Prince or Princess:
SELECT QUOTE(common_name)
FROM birds
WHERE common_name LIKE "%Prince%"
ORDER BY common_name;
+----------------------------------+
| QUOTE(common_name) |
+----------------------------------+
| 'Prince Henry\'s Laughingthrush' |
| 'Prince Ruspoli\'s Turaco' |
| 'Princess Parrot' |
+----------------------------------+
Notice in the results that because of the QUOTE()
function, the strings returned are enclosed in single quotes, and any
single quotes within the strings are escaped with a backslash. This can
prevent errors if the value is passed to another program.
One of the problems with allowing the public to enter data into a website is
that they’re not always careful. They do things like adding spaces
before and after the text. There are a few functions for trimming any
leading or trailing spaces from the values of a column.
The LTRIM()
function eliminates any leading
spaces to the left. For columns with spaces on the right, RTRIM()
will remove them. A more
versatile trimming function, though, is TRIM()
. With it,
you can trim both left and right spaces.
These trim functions can be useful for cleaning data with the
UPDATE
statement. Let’s look at an example of their use. In
these SQL statements, we’ll use LTRIM()
and
RTRIM()
to eliminate both leading and trailing
spaces:
UPDATE
humans
SET
name_first
=
LTRIM
(
name_first
),
name_last
=
LTRIM
(
name_last
);
UPDATE
humans
SET
name_first
=
RTRIM
(
name_first
),
name_last
=
RTRIM
(
name_last
);
In this example, we trimmed the leading spaces with the first UPDATE
and the trailing spaces with the
second one. Notice that we set the value of the columns to the same
values, but with the strings trimmed. We can combine these functions
into one SQL statement like so:
UPDATE
humans
SET
name_first
=
LTRIM
(
RTRIM
(
name_last
)
),
name_last
=
LTRIM
(
RTRIM
(
name_last
)
);
You can always combine functions like this for a more dynamic
result. In this case, though, the TRIM()
function is
a better alternative. Here’s the same SQL statement using
it:
UPDATE
humans
SET
name_first
=
TRIM
(
name_first
),
name_last
=
TRIM
(
name_last
);
The TRIM()
function also offers more options. You
can specify something other than spaces to remove. For instance, suppose
we receive a small table with bird sightings from another bird-watcher
club, as we did in Row Subqueries.
However, in this table, the scientific names of bird species are within
double quotes. If we wanted to insert that data into our
bird_sightings
table, we could use the same SQL query as we
did before, with the addition of the TRIM()
function. Here
is the relevant excerpt, the last lines on which we join their table to
our birds
table:
…
JOIN
rookery
.
birds
ON
(
scientific_name
=
TRIM
(
BOTH
'"'
FROM
science_name
)
)
);
It may be difficult to see, but we’re enclosing the character that
we want trimmed—a double quote—within single quotes. The keyword
BOTH
isn’t actually necessary because it’s the
default—that’s why we didn’t specify it in the previous example. If you
don’t want to remove the string given from one end or the other, you can
specify LEADING
or TRAILING
, thus making
TRIM()
work like LTRIM()
or
RTRIM()
. The default string to trim is a space, as we have
seen.
When displaying data in web forms and other such settings, it’s sometimes
useful to pad the data displayed with dots or some other filler. This
can be necessary when dealing with VARCHAR
columns where
the width varies. Padding the results of a column selected can help the
user to see the column limits. There are two functions that may be used
for padding: LPAD()
and RPAD()
.
There is also SPACE()
, which pads the string with spaces:
SELECT CONCAT(RPAD(common_name, 20, '.' ),
RPAD(Families.scientific_name, 15, '.'),
Orders.scientific_name) AS Birds
FROM birds
JOIN bird_families AS Families USING(family_id)
JOIN bird_orders AS Orders
WHERE common_name != ''
AND Orders.scientific_name = 'Ciconiiformes'
ORDER BY common_name LIMIT 3;
+--------------------------------------------------+
| Birds |
+--------------------------------------------------+
| Abbott's Babbler....Pellorneidae...Ciconiiformes |
| Abbott's Booby......Sulidae........Ciconiiformes |
| Abbott's Starling...Sturnidae......Ciconiiformes |
+--------------------------------------------------+
Notice how all the bird families and orders are aligned
vertically. This is because we padded each value out to its maximum
width using RPAD()
. The first argument was the column to
read, the second was the total size of the resulting string we want, and
the third was a period so that periods apear for columns that have less
text. This happens to work because MySQL uses a fixed-width font. We
could uses spaces instead of dots for a similar effect. For web display,
we might use
as padding element for non-breaking spaces.
There are a few functions for extracting a piece of text from a string. You
indicate the point from which to start selecting text and how much text
you want. There are four such functions: LEFT()
,
MID()
, RIGHT()
, and SUBSTRING()
.
The SUBSTRING_INDEX()
function is also related. We’ll look
at each one here.
Let’s look at the LEFT()
, MID()
, and
RIGHT()
functions first. Suppose our marketing agency
acquires a table called prospects
containing a list of people
who are known to be bird-watchers. Each person’s title and first and last
name is stored in a column called prospect_name
, with email
addresses in another column. The prospect_name
column is a
fixed character length data type, CHAR(54)
. The marketing
agency tells us that the title is contained in the first four characters,
the first name in the next 25, and the last name in the remaining 25. For
the titles, they’re using only Mr. and
Ms. with a space after each—hence the first four
characters—but we will extract just the first two characters for our
tables. Let’s see how that column looks by executing a simple
SELECT
to retrieve four names:
SELECT prospect_name
FROM prospects LIMIT 4;
+--------------------------------------------------------+
| prospect_name |
+--------------------------------------------------------+
| Ms. Caryn-Amy Rose |
| Mr. Colin Charles |
| Mr. Kenneth Dyer |
| Ms. Sveta Smirnova |
+--------------------------------------------------------+
As you can see, the data is a fixed width for each element.
Normally, with a CHAR
column, MySQL would not store the
trailing spaces. Whoever created this table enforced the rigid format (4,
25, and 25 characters) by executing SET sql_mode =
'PAD_CHAR_TO_FULL_LENGTH';
before inserting data into the
column.
With an INSERT INTO…SELECT
statement and a few
functions, we can extract and separate the data we need and put these
prospects in a new table we created that we call
membership_prospects
. Let’s execute the SELECT
first to test our organization of the functions before we insert the
data:
SELECT LEFT(prospect_name, 2) AS title,
MID(prospect_name, 5, 25) AS first_name,
RIGHT(prospect_name, 25) AS last_name
FROM prospects LIMIT 4;
+-------+---------------------------+---------------------------+
| title | first_name | last_name |
+-------+---------------------------+---------------------------+
| Ms | Caryn-Amy | Rose |
| Mr | Kenneth | Dyer |
| Mr | Colin | Charles |
| Ms | Sveta | Smirnova |
+-------+---------------------------+---------------------------+
In the example’s LEFT()
function, the starting point
for extracting data is the first character. The number we gave as an
argument (i.e., 2), is the number of characters we want to extract
starting from the first. The RIGHT()
function is similar,
but it starts from the last character on the right, counting left. The
MID()
function is a little different. With it, you can
specify the starting point (i.e., the fifth character in our example) and
how many characters you want (i.e., 25 characters).
The SUBSTRING()
function is synonymous with MID()
and their syntax is
the same. By default, if the number of characters to capture isn’t
specified, it’s assumed that all the remaining ones are to be extracted.
This makes these functions work like the LEFT()
function. If
the second argument to SUBSTRING()
or MID()
is
a negative number, the function will start from the end of the string,
making it like the RIGHT()
function.
Because the SUBSTRING()
function is so versatile, we
can use it to accomplish all the text extraction in the previous example.
The equivalent SELECT
would look like this:
SELECT
SUBSTRING
(
prospect_name
,
1
,
2
)
AS
title
,
SUBSTRING
(
prospect_name
FROM
5
FOR
25
)
AS
first_name
,
SUBSTRING
(
prospect_name
,
-
25
)
AS
last_name
FROM
prospects
LIMIT
3
;
This example shows three ways to use
SUBSTRING()
:
SUBSTRING(prospect_name, 1, 2) AS
title
This has the same syntax we have used for other functions in this section: three arguments to specify the column with the text, the starting point for extracting text, and the number of characters to extract.
SUBSTRING(prospect_name FROM 5 FOR 25)
AS first_name
This shows a different, wordier syntax. The starting point here is 5 and the number of characters to extract is 25.
SUBSTRING(prospect_name, -25) AS
last_name
This specifies a starting point of −25 characters. Because it doesn’t specify how many to extract, MySQL takes the remaining characters from that starting point.
You can use whatever style you prefer.
The SUBSTRING_INDEX()
is similar to the previous functions, but looks for elements that
separate data within a string. For example, suppose the
prospect_name
column was constructed differently. Suppose
that instead of having fixed width for the title and names, the text had
vertical bars between them. This would be odd for data in a column, but it
is possible. Here’s how we could separate the same column containing the
vertical bar character as the separator (the first and third third lines
using SUBSTRING_INDEX()
are fairly understandable, but the
second one is more complex):
SELECT
SUBSTRING_INDEX
(
prospect_name
,
'|'
,
1
)
AS
title
,
SUBSTRING_INDEX
(
SUBSTRING_INDEX
(
prospect_name
,
'|'
,
2
),
'|'
,
-
1
)
AS
first_name
,
SUBSTRING_INDEX
(
prospect_name
,
'|'
,
-
1
)
AS
last_name
FROM
prospects
WHERE
prospect_id
=
7
;
The second argument to SUBSTRING_INDEX()
tells MySQL
how to break the string into the pieces of text we want. In our example,
we use '|'
to specify the vertical bar. The number in the
third argument tells how many elements to take. So in the first line here
we’re saying to get the first element. In the third line, because it has a
negative sign in front of the number, we’re saying to count from the end
and get one element there. In the second line, we’re using
SUBSTRING_INDEX()
twice, one call embedded inside the other.
The inner call extracts the first two elements. Using those results, we
then use an outer call to extract its first element starting from the
end.
Using SUBSTRING()
is much nicer, but you need to know
the starting point and how many characters to take. In our vertical bar
example, we’d need to know exactly where the vertical bars are in each
name. To do that, you will need to use other functions to search strings.
Those are covered in the next section.
MySQL and MariaDB do not have comprehensive functions for searching string based
on patterns. Yes, there’s the REGEXP
operator that permits some pattern matching. But this isn’t as robust
and isn’t fine tuned as easily as the capabilities offered by programming
languages like PHP and Perl. But there are a few functions that assist in
searching strings. We’ll look at some of them in this section.
MySQL and MariaDB have a few built-in functions that can find characters within a string. These functions return the location where the search parameter was found.
The LOCATE()
function returns the numeric starting point just left of the first
occurrence of a given substring in a given string. It does not search
beyond this point. Let’s look at an example. Suppose we want a list of
Avocet birds—they’re a type of shore birds that is part of the
Recurvirostridae family. We could enter something
like this:
SELECT common_name AS 'Avocet'
FROM birds
JOIN bird_families USING(family_id)
WHERE bird_families.scientific_name = 'Recurvirostridae'
AND birds.common_name LIKE '%Avocet%';
+-------------------+
| Avocet |
+-------------------+
| Pied Avocet |
| Red-necked Avocet |
| Andean Avocet |
| American Avocet |
+-------------------+
Now suppose we want to eliminate the word
Avocet from the names returned. There are a few
ways we might do that: one way is to use the LOCATE()
function to find the word Avocet, and extract all
text before it with the SUBSTRING()
function:
SELECT
SUBSTRING(common_name, 1, LOCATE(' Avocet', common_name) ) AS 'Avocet'
FROM birds
JOIN bird_families USING(family_id)
WHERE bird_families.scientific_name = 'Recurvirostridae'
AND birds.common_name LIKE '%Avocet%';
+-------------+
| Avocet |
+-------------+
| Pied |
| Red-necked |
| Andean |
| American |
+-------------+
That’s a cumbersome example, but it shows you how you can use
LOCATE()
in conjunction with other functions to get what
you want from a string. Let’s look at another example.
Earlier in this chapter, in Trimming and Padding Strings, we had some examples
involving merging data from another bird-watcher group. That included
using the TRIM()
function to remove quotes from
around the scientific names of birds spotted by people in that group.
Let’s use that column again, but assume that it doesn’t have quotes.
Instead, the bird species is given with its bird family in this format:
bird species - bird family
. For this, we can
use the LOCATE()
function to locate the hyphen and then
the SUBSTRING()
to get the family name for the
JOIN
clause in that earlier example. Here’s just the excerpt from the
JOIN
clause:
…
JOIN
rookery
.
birds
ON
(
scientific_name
=
SUBSTRING
(
science_name
,
LOCATE
(
' - '
,
science_name
)
+
3
)
);
Let’s pull this apart to understand it better. First, let’s focus
on the inner function, the LOCATE()
. The search parameter
it’s given is a hyphen surrounded by spaces. The
science_name
column is the string to search. This function
will return the position in the string where the search parameter is
found. We’re adding 3 to that because the search parameter is three
characters long—in other words, LOCATE()
gives us the
point before the separator and we want to get the
substring after the end of the separator. So the
results of LOCATE()
+ 3 is given as the starting point for
the SUBSTRING()
function. Because we’re not specifying how
many characters we want, MySQL will extract the remaining characters.
That will give us the scientific name of the bird in the table we’re
joining to birds
.
The POSITION()
function works like LOCATE()
, except that it takes
the keyword IN
instead of a comma between the substring
you’re searching for and the containing string:
POSITION
(
' - '
IN
science_name
)
In addition, LOCATE()
accepts an optional argument
to indicate the starting point for the search, which is not available in POSITION()
.
Another function for searching a string is FIND_IN_SET()
. If you have a string that
contains several pieces of data separated by commas, this function tells
you which element in that set of data contains the search pattern you
give it. To understand this better, suppose that we want to get a list
of members from Russia, but ordered by the date when the members joined.
We would enter this:
SELECT human_id,
CONCAT(name_first, SPACE(1), name_last) AS Name,
join_date
FROM humans
WHERE country_id = 'ru'
ORDER BY join_date;
+----------+-------------------+------------+
| human_id | Name | join_date |
+----------+-------------------+------------+
| 19 | Elena Bokova | 2011-05-21 |
| 27 | Anahit Vanetsyan | 2011-10-01 |
| 26 | Katerina Smirnova | 2012-02-01 |
+----------+-------------------+------------+
Now suppose that we want to know the position of the member
Anahit Vanetsyan in the list of Russian members. We
can see easily from the results just shown that she is the third member
from Russia to join. That’s because there are very few results here.
Imagine if the results contained hundreds of names. We could use
FIND_IN_SET()
with a subquery to determine this:
SELECT FIND_IN_SET('Anahit Vanetsyan', Names) AS Position
FROM
(SELECT GROUP_CONCAT(Name ORDER BY join_date) AS Names
FROM
( SELECT CONCAT(name_first, SPACE(1), name_last) AS Name,
join_date
FROM humans
WHERE country_id = 'ru')
AS derived_1 )
AS derived_2;
+----------+
| Position |
+----------+
| 2 |
+----------+
This is a pretty complex SQL statement. The innermost
SELECT
is essentially the query we saw earlier, but
returning just the full name and join date for each Russian person.
These results are fed to GROUP_CONCAT
, which produces a single huge string containing all the names.
The outermost SELECT
finds the name we want and returns its
position.
When you put a SELECT
statement inside parentheses
and derive a table from it that you will use with an outer statement,
you must give that derived table a name using AS
. For
naming simplicity, we’ve named the derived tables in this chapter
derived_1
and derived_2
. Almost any unique
name is fine.
The statement can be useful if we associate it with a user profile page on the Rookery website. We might want to use it to show members where they rank in different lists, such as most sightings of birds or most sightings of birds in a particular category.
FIND_IN_SET()
returns 0 if the string is not found
in the set or if the string list is empty. It returns NULL if the value
of either argument is NULL.
There will be times you want to know how long a string is. There are a
few functions that return the character length of a string. This can be
useful when adjusting formatting or making other decisions related to a
string, and they are commonly used with functions like
LOCATE()
and SUBSTRING()
.
The CHAR_LENGTH()
or
CHARACTER_LENGTH()
function returns the number of characters in a string. This could
be helpful when different rows have different-length strings in a
particular column.
For instance, suppose we want to display on the Rookery website a
list of the birds most recently sighted by members, as recorded in the
bird_sightings
table. We’ll include the common and
scientific name and other information about the bird species. Suppose
that we want to also include the comments
that the member
entered when they recorded the sighting. Because this column can contain
a lot of text, we want to know how many characters it contains when
displaying it. If there’s too much (i.e., more than 100 characters),
we’ll limit the text and include a link on the web page to view all of
the text. To check the length, we could construct an SQL statement like
this that would be part of a program:
SELECT
IF
(
CHAR_LENGTH
(
comments
)
>
100
),
'long'
,
'short'
)
FROM
bird_sightings
WHERE
sighting_id
=
2
;
Here we’re using CHAR_LENGTH()
to count the number
of characters in the comments
column for the row selected.
We’re using the IF()
function to determine whether the
character length of the comments is greater than 100 characters. If it
is, the function will return the word long
. If not, it will
return short
. If this SQL statement was used in an API
script, the value in the WHERE
clause for the
sighting_id
could be dynamically replaced for each bird
sighting.
CHAR_LENGTH()
understands the character set in
current use, as we touched on in Creating a Database. Characters that take up
multiple bytes—usually present in Asian languages—are still considered
one character. In contrast, the LENGTH()
function returns the number of
bytes in a given string. Note that there are eight bits to a byte and
that Western languages normally use one byte for each letter. If you
want to count the number of bits, use the BIT_LENGTH()
function.
As an example, suppose we notice that the comments
column of the bird_sightings
table contains some odd binary
characters. They have been entered into the column through the mobile
application we provide to members. To narrow the list of rows that have
these odd characters so that we can remove them, we can execute the
following SQL statement:
SELECT
sighting_id
FROM
bird_sightings
WHERE
CHARACTER_LENGTH
(
comments
)
!=
LENGTH
(
comments
);
This will give us the sighting_id
for the rows in
which the number of characters does not equal the number of bytes
in the comments
column.
The previous subsection used the output of CHAR_LENGTH()
as input to an IF()
statement so that we had a choice of
what to return. In this subsection, we’ll look at some functions that
compare strings, which can also be handy when used with a logical
function such as IF()
or in a WHERE
clause.
Let’s consider a situation where we might use one of these
functions—specifically, the STRCMP()
function. The name of the
function, in the manner much loved by computer programmers, is a
compressed version of “string compare.”
Email addresses are critical for communicating with members so we
decide to require new members to enter their email address twice during
the registration process to ensure accuracy. However, in case the
connection is lost in the process or the joining member does not correct
a problem with their email address, we want to keep both addresses until
they do. So we’ll add a row to the humans
table to store
whatever information they give us, and then store both email addresses
in another table to compare them. For that comparison, we could use the
STRCMP()
function in an SQL statement.
This scenario is the kind of situation that you would automate with an API program, a program you would create to interface with MySQL or MariaDB. It would store the SQL statements needed for processing the information the new member enters from the website. To start the process related to checking the email, we might create a table that will store the member’s identification number and the two email addresses. We could do that like so:
CREATE
TABLE
possible_duplicate_email
(
human_id
INT
,
email_address1
VARCHAR
(
255
),
email_address2
VARCHAR
(
255
),
entry_date
datetime
);
Now when new members register, after their information has been
stored in the humans
table, our web interface can store
conditionally the two email addresses provided in the
possible_duplicate_email
table. It might look like
this:
INSERT
IGNORE
INTO
possible_duplicate_email
(
human_id
,
email_address_1
,
email_address_2
,
entry_date
)
VALUES
(
LAST_INSERT_ID
(),
'bobyfischer@mymail.com'
,
'bobbyfischer@mymail.com'
)
WHERE
ABS
(
STRCMP
(
'bobbyrobin@mymail.com'
,
'bobyrobin@mymail.com'
)
)
=
1
;
For the email addresses, I’ve displayed the plain text. But in a
more realistic example, this SQL statement might be embedded in a PHP
script and would refer to variables (e.g., $email_1
and
$email_2
) where the email addresses are here.
Using the STRCMP()
in the WHERE
clause,
if the email addresses match, STRCMP()
returns
0. If the addresses don’t match, it will return 1 or -1. It returns -1
if the first value is alphabetically before the second. To allow for
that possibility, we put it inside of ABS()
, which changes the value to the absolute value—it makes
negative values positive. So, if the two email addresses don’t match,
the statement will insert the addresses into the
possible_duplicate_email
table for an administrator to
review. Incidentally, that would normally return an error message, but
IGNORE
flag tells MySQL to ignore errors.
Another comparison function is MATCH() AGAINST()
, which searches for a
string and returns matching rows from the table. It even ranks the rows
by relevance, but that is beyond the scope of this chapter. Among the
complications of MATCH() AGAINST()
, it works only on
columns that have been indexed with a special FULLTEXT
index. To test this
function, we’ll first add a FULLTEXT
index to the
bird_sightings
table, basing it on the
comments
column because that’s a TEXT
column:
CREATE
FULLTEXT
INDEX
comment_index
ON
bird_sightings
(
comments
);
Now you can use MATCH() AGAINST()
. It is commonly
found in WHERE
clauses as a condition to find
columns containing a given string. Text in the given string, which is
delimited by spaces or quotes, is parsed into separate words. Small
words (i.e., three characters or fewer) are generally ignored. Here is
an example:
SELECT CONCAT(name_first, SPACE(1), name_last) AS Name, common_name AS Bird, SUBSTRING(comments, 1, 25) AS Comments FROM birdwatchers.bird_sightings JOIN birdwatchers.humans USING(human_id) JOIN rookery.birds USING(bird_id) WHERE MATCH (comments) AGAINST ('beautiful'); +-------------------+-----------------+---------------------------+ | Name | Bird | Comments | +-------------------+-----------------+---------------------------+ | Elena Bokova | Eskimo Curlew | It was a major effort get | | Katerina Smirnova | Eurasian Curlew | Such a beautiful bird. I | +-------------------+-----------------+---------------------------+
In the WHERE
clause, we’re able now to match the
comments
column against the string beautiful
.
The comments
column from the
birdwatchers.bird_sightings
is combined in the results with
three other columns: common_name
from
rookery.birds
and name_first
and
name_last
from birdwatchers.humans
.
We’re using the SUBSTRING
function to limit the
amount of text displayed. This cuts off the text abruptly. You could use
the CONCAT()
function to append ellipses to indicate there is more text. You
might also use the IF()
function to determine whether there is more text before appending
ellipses. There are other functions you can use for locating the
beautiful
within the column so that you can display only
the text around it. We’ll cover that kind of function later in this
chapter.
If you want to insert or replace certain text from a column (but not all of its
contents), you could use the INSERT()
function. Don’t
confuse this with the INSERT
statement. The syntax of this
function consists of the string or column into which you want to insert
text, followed by the position in which to insert text. You may specify
also how much text to delete from that point, if you want. Finally, you
give the text to insert. Let’s look at some examples of this
function.
We’ll start with a simple example. Suppose that on a page of the Rookery site, we are thinking of adding some text to the common names of bird species with the word Least in their name. We want to explain that it means Smallest, so that uninformed birders don’t think it means these birds are the least important. To test this, we enter this SQL statement:
SELECT INSERT(common_name, 6, 0, ' (i.e., Smallest)')
AS 'Smallest Birds'
FROM birds
WHERE common_name LIKE 'Least %' LIMIT 1;
+------------------------------+
| Smallest Birds |
+------------------------------+
| Least (i.e., Smallest) Grebe |
+------------------------------+
The first argument is the column containing the string we’re
manipulating. The second argument is the starting point for inserting
text. Based on the WHERE
clause, we’re looking for common
names that start with Least. That’s 5 characters.
We add 1 to that because the starting point for INSERT
is
1. The third argument specifies how many characters after the starting
point should be replaced. In this case, we’re just inserting text, not
replacing any.
The SQL statement uses INSERT()
to change the
results set, not the data in the table. So we could use the
INSERT()
function to display the common names like this to
new members for the first month who have identified themselves as new to
bird-watching. We would have to construct a more complex SQL statement
to check who is new, but this example shows you how to insert text
within a string. Let’s look now at an example in which we will replace
data using INSERT()
.
Suppose we discover that parts of some of the common bird species
names are abbreviated in the birds
table (e.g.,
Great is abbreviated as Gt.).
We prefer not to have any abbreviations for the common names. Before
changing the data, we’ll execute a SELECT
statement to test
our use of the INSERT()
function:
SELECT common_name AS Original,
INSERT(common_name, LOCATE('Gt.', common_name), 3, 'Great') AS Adjusted
FROM birds
WHERE common_name REGEXP 'Gt.' LIMIT 1;
+------------------+--------------------+
| Original | Adjusted |
+------------------+--------------------+
| Gt. Reed-Warbler | Great Reed-Warbler |
+------------------+--------------------+
We’ve already reviewed the arguments of the INSERT()
function in the previous example. The extra twist here is in the second
argument, which contains the LOCATE()
. We’re using that function to
determine the position in the string where text is to be replaced. In
the previous example, we assumed that the common name would start with
the string we wanted to modify. In this case, we’re not assuming the
position of the string within the column. Instead, we’re letting MySQL
find it for us.
Another difference in this example is the third element: we’re telling the function to replace three characters (i.e., the length of Gt.) from the starting point with the text given for the fourth argument (i.e., Great). Although the text we’re adding is more than three characters, it’s fine because when we update the table later, we’re updating a column with plenty of space to hold the results.
If LOCATE()
does not find the string we give it, it
returns 0. A value of 0 for the position in the INSERT()
function negates it and returns the value of
common_name
unchanged. So with this usage of
INSERT()
, because of the inclusion of
LOCATE()
for the starting location, the WHERE
clause is unnecessary—except to see that it works where we want it
to.
Now that we’ve verified that our combination of functions works correctly, we can update the data by entering the following SQL statement:
UPDATE
birds
SET
common_name
=
INSERT
(
common_name
,
LOCATE
(
'Gt.'
,
common_name
),
3
,
'Great'
)
WHERE
common_name
REGEXP
'Gt.'
;
There is an alternative to using INSERT()
for
replacing text in a string. In the previous example, we had to use the
LOCATE()
function to determine the location of the text
where we wanted to insert text and we had to tell it how many characters
to replace. A simpler function for replacing text is
REPLACE()
. We could use this function to replace all occurrences of
Gt. with Great in the
common_name
column. Let’s test that with a SELECT
statement like so:
SELECT common_name AS Original,
REPLACE(common_name, 'Gt.', 'Great') AS Replaced
FROM birds
WHERE common_name REGEXP 'Gt.' LIMIT 1;
+------------------+--------------------+
| Original | Replaced |
+------------------+--------------------+
| Gt. Reed-Warbler | Great Reed-Warbler |
+------------------+--------------------+
This works much better. We can use the REPLACE()
with the arguments we have here and enter the following
UPDATE
to change the data in the table:
UPDATE birds
SET common_name = REPLACE(common_name, 'Gt.', 'Great');
Query OK, 8 rows affected (0.23 sec)
Rows matched: 28891 Changed: 8 Warnings: 0
Notice that we didn’t include the WHERE
clause, but
the results message says that only eight rows were changed. This is
because there were only eight rows that contained
Gt. in the common_name
column.
Updating data in a table with that many rows is intimidating and
dangerous without a WHERE
clause. That’s why it’s good to
use them and to test the parameters with a SELECT
statement first.
There may be times when you will have to work with tables created by
people who might not have made the best choices for column data types.
Sometimes you can alter the tables, but sometimes you may not be allowed
to do so. For manipulating data from such tables or for importing data
from them, you can use the CAST()
or CONVERT()
functions to change the data type of columns. The effect just takes place
within your SQL statement, not the database itself. Let’s look at some
examples of how and why you might use these two functions, which are
basically synonymous except for a minor syntax difference.
Suppose we’re given a table containing images of birds in a
particular area, showing female, male, and juvenile color patterns. One of
the columns contains numbers for ordering birds based loosely on the type
of bird and the date when usually seen in the area. This column isn’t a
numeric data type like INT
, but is CHAR
. When we
sort the data based on this column, MySQL will sort the rows lexically,
not numerically. Here’s an example of how that might look:
SELECT sorting_id, bird_name, bird_image
FROM bird_images
ORDER BY sorting_id
LIMIT 5;
+------------+-----------------+----------------------------+
| sorting_id | bird_name | bird_image |
+------------+-----------------+----------------------------+
| 11 | Arctic Loon | artic_loon_male.jpg |
| 111 | Wilson's Plover | wilson_plover_male.jpg |
| 112 | Wilson's Plover | wilson_plover_female.jpg |
| 113 | Wilson's Plover | wilson_plover_juvenile.jpg |
| 12 | Pacific Loon | pacific_loon_male.jpg |
+------------+-----------------+----------------------------+
Notice that the rows with a sorting_id
starting with
11n are listed before one with the value of 12.
That’s because MySQL is reading the data as characters and not numbers.
The two Loons should be together, before the Plovers are listed.
We can use the CAST()
function to cast the values
taken from sorting_id
into the INT
data
type:
SELECT sorting_id, bird_name, bird_image
FROM bird_images ORDER BY CAST(sorting_id AS INT) LIMIT 5;
+------------+-----------------+----------------------------+
| sorting_id | bird_name | bird_image |
+------------+-----------------+----------------------------+
| 11 | Arctic Loon | artic_loon_male.jpg |
| 12 | Pacific Loon | pacific_loon_male.jpg |
| 111 | Wilson's Plover | wilson_plover_male.jpg |
| 112 | Wilson's Plover | wilson_plover_female.jpg |
| 113 | Wilson's Plover | wilson_plover_juvenile.jpg |
+------------+-----------------+----------------------------+
That worked correctly. Let’s suppose now that we don’t want to use
sorting_id
, but instead the gender_age
column.
This is an ENUM
column specifying that the image file is for a male
,
female
, or a juvenile
. The color patterns of
most birds deviate based on these factors. Let’s see how the results will
look if we sort based on this column:
SELECT bird_name, gender_age, bird_image
FROM bird_images
WHERE bird_name LIKE '%Plover%'
ORDER BY gender_age
LIMIT 5;
+-----------------+------------+----------------------------+
| bird_name | gender_age | bird_image |
+-----------------+------------+----------------------------+
| Wilson's Plover | male | wilson_plover_male.jpg |
| Snowy Plover | male | snowy_plover_male.jpg |
| Wilson's Plover | female | wilson_plover_female.jpg |
| Snowy Plover | female | snowy_plover_female.jpg |
| Wilson's Plover | juvenile | wilson_plover_juvenile.jpg |
+-----------------+------------+----------------------------+
Notice that the rows are grouped together based on the
gender_age
column, but those values are not in alphabetical
order (i.e., female rows should be before
male rows). This is because of how the enumerated
values are listed in the gender_age
column:
SHOW COLUMNS FROM bird_images LIKE 'gender_age' \G *************************** 1. row *************************** Field: gender_age Type: enum('male','female','juvenile') Null: YES Key: Default: NULL Extra:
To MySQL, the value of male
for the
gender_age
column is stored as 1, and female
as
2. This controls the order of the display, even though the values are
rendered as text. If we use though the CAST()
or the
CONVERT()
function in the ORDER BY
clause, MySQL will sort the
results based on their rendered values and not their column values. Here’s
how that would look:
SELECT bird_name, gender_age, bird_image
FROM bird_images
WHERE bird_name LIKE '%Plover%'
ORDER BY CONVERT(gender_age, CHAR)
LIMIT 5;
+-----------------+------------+----------------------------+
| bird_name | gender_age | bird_image |
+-----------------+------------+----------------------------+
| Wilson's Plover | female | wilson_plover_female.jpg |
| Snowy Plover | female | snowy_plover_female.jpg |
| Wilson's Plover | juvenile | wilson_plover_juvenile.jpg |
| Snowy Plover | juvenile | snowy_plover_juvenile.jpg |
| Wilson's Plover | male | wilson_plover_male.jpg |
+-----------------+------------+----------------------------+
Notice that for the CONVERT()
function, a comma is
used to separate the string given from the data type instead of the
AS
keyword. The data type given as the second argument can
be BINARY
, CHAR
,
DATE
, DATETIME
, SIGNED [INTEGER]
,
TIME
, or UNSIGNED
[INTEGER]
. BINARY
converts a string to a binary string. You can add also CHARACTER
SET
to use a different character set from the default for the value
given. To convert the character set of a given string to another, you have
to use the USING
option, like so:
SELECT
bird_name
,
gender_age
,
bird_image
FROM
bird_images
WHERE
bird_name
LIKE
'%Plover%'
ORDER
BY
CONVERT
(
gender_age
USING
utf8
)
LIMIT
5
;
Some column data types allow large amounts of data. For instance,
the BLOB
column can store plenty. To reduce the
size of tables that use this column data type, you can compress the data
it contains when inserting the data. The COMPRESS()
function
compresses a string and the UNCOMPRESS()
function decompresses a compressed string. If you want to use them,
MySQL has to have been compiled with a compression library (i.e.,
zlib
). If it wasn’t, a NULL value will be returned when using
COMPRESS()
. Let’s look at some examples of their use.
The humans
table has a column for
birding_background
which is a BLOB
. Members can
write as much as they like about themselves, which could result in pages
of information on their experiences and education as bird-watchers. This
could potentially slow down queries and updates if many members do this.
So we decide to use COMPRESS()
to compress the member’s
background when inserting it into the humans
table. Here’s
how that might look:
INSERT
INTO
humans
(
formal_title
,
name_first
,
name_last
,
join_date
,
birding_background
)
VALUES
(
'Ms'
,
'Melissa'
,
'Lee'
,
CURDATE
(),
COMPRESS
(
"lengthy background..."
));
This SQL statement inserts a new member’s information into the
humans
table—it has more columns than shown here, but we’re
trying to keep this example simple. The statement uses the
COMPRESS()
function to compress the background information
given (which isn’t much for this simple example). You would normally get
such data from an API variable using something like PHP to store text
entered by the user through a web page. So instead of the text shown here,
you would use a variable (e.g.,
$birding_background
).
To see how the data looks in the compressed form, we could do this:
SELECT birding_background AS Background
FROM humans
WHERE name_first = 'Melissa' AND name_last = 'Lee' \G
*************************** 1. row ***************************
Background: x#####/ɨTHJL##/######## Z######
Notice that the results are not normal text. The mysql client substitutes a hash sign
(#
) for binary values. In order to see the text contained in
this compressed format, we would use UNCOMPRESS()
. It returns
NULL if the string is not compressed or if MySQL wasn’t compiled with
zlib
:
SELECT UNCOMPRESS(birding_background) AS Background
FROM humans
WHERE name_first = 'Melissa' AND name_last = 'Lee' \G
*************************** 1. row ***************************
Background: lengthy background...
For small amounts of text like this, compression takes more space than the plain text. But for large amounts of text, it will save plenty of space. So use it sparingly and where appropriate.
There are more string functions available in MySQL and MariaDB. A few of the functions mentioned here have aliases or close alternatives. There are also functions for converting between ASCII, binary, hexadecimal, and octal strings. And there are also string functions related to text encryption and decryption that were not mentioned. However, I think this chapter has given you a good collection of common string functions that will assist you in building more powerful SQL statements and formatting results to be more attractive.
String functions are very necessary to developing databases in MySQL and MariaDB. You need to know them well. To become an expert, you need to practice using them, so be sure to complete all of the following exercises.
One of the most commonly used string functions is
CONCAT()
. Construct a SELECT
statement to
query the humans
table. Use the CONCAT()
function to merge together values from the name_first
column with the name_last
column. Use the
SPACE()
function to put a space between them in the
results. Give that field an alias of Full Name
—and
remember to put quotes around this alias, as it contains a space.
Limit the results to four people. Execute it to be sure it has no
errors.
Add a WHERE
clause to that SELECT
statement. For the condition of the WHERE
clause, copy
the CONCAT()
you just assembled. List rows where the
name is in a set of the following names: Lexi Hollar, Michael
Zabalaoui, and Rusty Johnson.
After you successfully execute the SELECT
with that
WHERE
clause, add an ORDER BY
clause to sort
the data based on the concatenated name. Do it without using
CONCAT()
.
Construct a SELECT
statement that selects, from the
birds
table, the common_name
and the
scientific_name
. Use a string function to change the
scientific_name
to all lowercase letters. Use the
CONCAT()
function to put them into one field, with a
space after the common name, followed by the scientific name in
parentheses—for example, African Desert Warbler (sylvia
deserti). Don’t use the SPACE()
function.
Instead, put the spaces and parentheses within single quote marks
within the CONCAT()
. Give the
resulting field an alias of Bird Species
. Limit the
results to 10 rows.
After you’ve successfully executed that SQL statement, modify
that statement to join in the bird_families
and the
bird_orders
tables. The JOIN
statement was
covered extensively in Unifying Results.
Then add the scientific_name
columns from both of these
tables to the fields returned.
Execute this modified statement to make sure your joins are
correct. When they are, move the scientific_name
columns
for the two additional tables into the CONCAT()
. Using
the RPAD()
function, put dots after the bird species
name, before the bird family and the bird order names. The results for
a field will look like this:
Speckled Warbler (pyrrholaemus sagittatus)...Acanthizidae...Passeriformes
This will probably require you to use CONCAT()
twice. Use a WHERE
clause to list only Warblers. Limit
the results to 10 rows.
Construct another SELECT
statement to list all of
the common names of bird species from the birds
table,
where the common name contains the word Shrike.
When you execute that statement you should see some names with a
hyphen after the word Shrike. Add the
REPLACE()
function to the SELECT
statement
to replace those hyphens with a space in the results, and then execute
the SQL statement again.
Some of the names of the birds in the results from the
SELECT
statement in the previous exercise have more than
one hyphen (e.g., Yellow-browed Shrike-Vireo).
Redo that SQL statement to replace only the hyphens after the word
Shrike (e.g., to look like this:
Yellow-browed Shrike Vireo). In order to do this,
use LOCATE()
with REPLACE()
. You will need
to use LOCATE()
twice: one within another.
True Shrikes are of the Laniidae family.
Construct another SELECT
to select the common bird names
with the word Shrike, but belonging to
Laniidae. This will require a join to the
bird_families
table. Use one of the substring functions
like SUBSTRING()
to extract the words before
Shrike. To do this, you will need to use
LOCATE()
or a similar function. Then use
CONCAT()
to display that value extracted after
Shrike with a comma and space in between. The
results for each field should look like this: Shrike,
Rufous-tailed
. Give the field an alias of
Shrikes
.
The humans table contains entries in which the member used
either all lowercase letters or all uppercase letters to enter their
first and last names (e.g., andy oram
and MICHAEL
STONE
). Use UPDATE
to change the names to title
case (i.e., the first letter capital and the rest lowercase). First
experiment with SELECT
to make sure you have the
functions organized properly. Use the UCASE()
and
LCASE()
functions to set the cases. You will need to use
SUBSTRING()
or a similar function a few times, and
CONCAT()
a couple of times.