Chapter 1

Introducing MySQL

IN THIS CHAPTER

check Understanding why you need a database

check Seeing how MySQL works

check Exploring the advanced features of MySQL

Computers are all about storing information. However, unlike that junk drawer in your kitchen that contains multiple shards of paper with names and phone numbers scribbled on them, you want to store your dynamic web application data in an orderly fashion. After all, you wouldn’t want to mix up the data from your astrophysics experiments with your bowling league scores!

The MySQL database server provides a user-friendly platform for you to organize your application data, making it simple to identify which data belongs to which application and easy for the application to access the data, all while maintaining security so the right people can only get to the right data. This chapter describes just why you need a database for your dynamic web applications, and why you should choose the MySQL database server.

Seeing the Purpose of a Database

With PHP, you have a few different options for storing persistent data in your application to retrieve at a later time. One method is to use the PHP file system functions to create a standard text file on the server to store the application data, and then read the data back as necessary.

One downside to using standard text files to store your application data is that it’s hard to find a specific data item buried in the text file. Standard text files are often called “flat files” because you can’t create any type of relationships in the data to make searching for specific information easier. Your application must open the text file and read each line until it finds the data it needs. That’s fine for small amounts of data, but for large amounts of data that can be slow, especially if there are thousands of site visitors all trying to access their data from the same file at the same time.

To solve that problem, most web developers have turned to using databases. Databases organize data in a manner making it easier for the database server to insert, find, modify, and delete data. There are lots of different database types available, but one of the most popular is the relational database system. This section describes how relational databases work with data to help speed up your web application.

How databases work

Microsoft Access is by far the most popular end-user database tool developed for commercial use. Many Windows users, from professional accounts to bowling league secretaries, use Access to track data. It provides an easy, intuitive user interface, allowing novice computer users to quickly produce queries and reports with little effort.

However, despite its user-friendliness, Access has its limitations. To fully understand how MySQL differs from Access, you must first understand how database systems are organized.

There is more to a database than just a bunch of data files. Most databases incorporate several layers of files, programs, and utilities, which all interact to provide the database experience. The whole package is referred to as a database management system (DBMS).

There are different types of DBMS packages, but they all basically contain the following parts:

  • A database engine
  • One or more database files
  • An internal data dictionary
  • A query language interface

The database engine is the heart and brains of the DBMS. It controls all access to the data, which is stored in the database files. Any application (including the DBMS itself) that requires access to data must go through the database engine (see Figure 1-1).

image

FIGURE 1-1: A simple database management system.

The database engine uses an internal data dictionary to define how the database operates, the type of data that can be stored in the database files, and the structure of the database. It basically defines the rules used for the DBMS. Each DBMS has its own data dictionary.

If you’re a user running a simple database on Access, you probably don’t even realize you’re using a database engine. Access keeps much of the DBMS work under the hood and away from users. When you start Access, the database engine starts, and when you stop Access, the database engine stops.

In MySQL, the database engine runs as a service that is always running in the background on the server. Users run separate application programs that interface with the database engine while it’s running. Each application can send queries to the database engine and process the results returned. When the application stops, the MySQL database engine continues to run, waiting for commands from other applications.

Both Access and MySQL require one or more database files to be present to hold data. If you work with Access, you’ve seen the .mdb database files. These files contain the data defined in tables created in the Access database. Each database has its own .mdb file.

In the Access environment, if two or more applications want to share a database, the database file must be located on a shared network drive available to all the applications. Each application has a copy of the Access database engine program running on the local workstation, which points to the common database file, as shown in Figure 1-2.

image

FIGURE 1-2: A shared Microsoft Access environment.

Where this model falls apart is that there are multiple database engines, all trying to access the database files across a network environment. This generates large amounts of data on the network and slows down the performance of the individual database engines.

In the MySQL model, the database engine and database files are always on the same computer. Queries and reports run from separate applications, but they all send requests to the common database engine, as shown in Figure 1-3.

image

FIGURE 1-3: A multiuser MySQL environment.

As you can see from Figure 1-3, the MySQL database engine accepts data requests from multiple users across the network. All the database access is performed on the local system running the MySQL server, so the data interaction with the database files stays on the local system. The database engine only sends the query or report results across the network to the applications.

This feature alone makes using MySQL a better database choice in multiuser database projects.

Relational databases

Databases are all about arranging data to make finding information faster. Relational database theory arranges data in three levels: databases, tables, and data fields.

Databases

A database groups related data into a single container. The database is the highest level or grouping of data on the relational database server. The server allows you to create multiple databases, all accessible from the same server service running on the server.

tip To help keep things organized, it's a good idea to create a separate database for each application you’re hosting on the server. This helps to separate data elements and eliminates accidents caused by accessing the wrong data from the wrong application.

remember Each database you create must have a unique name on the server. To help with the organization process, it’s usually a good idea to somehow relate the database name to the name of the application.

Table

The table is a subset of data within the database, which contains a grouping of similar data items. For example, if a company wants to track data on employees, customers, and products, instead of having just one group of mixed-up data elements, the company would create four separate tables to hold the data:

  • An Employees table to hold data related to employees
  • A Customers table to hold data related to customers
  • A Products table to hold data related to products
  • An Orders table to track which products are in individual customer orders

The process of grouping application data into tables is called data normalization. Grouping similar data into its own table gives you more control over the data. For example, if you have a program that interfaces only with customer orders, you can give it permissions to only the Customers, Products, and Orders tables, leaving the Employees table safe from accidental exposure.

Data fields

You use data fields to hold individual data elements within a table. For example, the Employees table might contain data fields for an employee ID number, first name, last name, home address, salary, and employment start date. The data fields are the core of the application because they’re where the application actually stores data.

The table groups data fields into data records. Each data record is a single occurrence of values for each of the data fields. Figure 1-4 shows a diagram of how the Employees table might look.

image

FIGURE 1-4: An example of an Employees table layout.

Figure 1-4 shows the data fields as table headings. Each data record appears as a single line of data in the table (in this case, the information for a single employee). Because data is often displayed this way in a table, you’ll often hear the word row used to reference a single data record.

Database data types

Just as with variables in programming languages, databases need to identify the type of data stored in each data field so that it knows how much space to reserve to store the data, and how to handle the data. Table 1-1 shows the basic data types found in most relational database systems.

TABLE 1-1 Standard Database Data Types

Data Type

Description

int

A whole number between –2,147,483,648 and 2,147,483,647

float

A floating point number between –3.40283466E+38 and +3.40283466E+38

bool

A Boolean true or false value

date

A day value in the YYYY-MM-DD format

datetime

A day and time value displayed in YYYY-MM-DD HH:MM:SS format

char(x)

A fixed-length character string with x characters

varchar(x)

A variable-length character string with x or fewer characters

text

A variable-length character string of up to 65,536 characters stored as a binary value

Many relational database servers provide variations of these standard data types, such as small integer values or large text values, to help you customize exactly how much space to reserve for each data field. Unfortunately, these customized data types aren't necessarily standardized between relational databases.

Data constraints

Besides the data field name and value, a data field can be marked with special data constraints. Relational databases use data constraints to control how you place data into a data field. The most popular data constraint is the primary key.

A primary key defines the table data field(s) that uniquely identify each individual data record in the table. For example, if you’re retrieving an employee data record and your company has two employees named John Smith, you’ll run into a problem trying to get the correct data for the correct employee. To solve this problem, relational databases allow you to add a special data field to tell you which John Smith each data record refers to.

To do this, you must create an employee ID data field and assign a unique ID number to each employee. Because the new employee ID data field uniquely identifies each employee record, you can specify it as the primary key for the Employees table. The database server creates a separate hidden table relating the primary key values to data record numbers, and then uses it as an index to quickly retrieve the correct data record based on the primary key value.

Another popular data constraint you’ll run across is the is null restriction. If you set a data field with the is null data constraint, the database server will prevent you from entering a data record without a value in that data field.

Structured Query Language

The Structured Query Language (SQL) is a language for interacting with relational database systems that been around since the early 1970s. Over the years, other database vendors have tried to mimic or replace SQL with their own query languages. But despite their attempts, SQL still provides the easiest interface for both users and administrators to interact with any type of relational database system.

In 1986, the American National Standards Institute (ANSI) created the first SQL standards. The U.S. government adopted them as a federal standard and named it ANSI SQL89. Most commercial database vendors now use this SQL standard to interface with their products.

tip The SQL standard has been evolving over the years, with new standards being released to support new advanced database features. At the time of this writing, the most current standard is SQL:2016.

The SQL language specifies a format that you use to send commands to the database server. The SQL command format consists of:

  • A keyword: SQL keywords define the action the database server takes based on the SQL statement. The SQL standard defines lots of different keywords for performing lots of different actions. However, you'll find yourself just using a few standard keywords in your database programming, so it’s not all that hard to remember them. Table 1-2 lists the popular ones you’ll get to know.
  • An identifier: The SQL command identifier defines the database object used in a command. This is most often a database name, table name, or the names of data fields. The SQL identifiers help you select which data elements to retrieve from the database and which table to select them from.
  • One or more literals (optional): SQL command literals define specific data values referenced by the keyword. Literals are constant values, such as data values to insert into a table or data values used to search within the table data. You must enclose string literals in quotes (either single or double quotes), but you can use numerical values without quotes.

TABLE 1-2 SQL Keywords

Keyword

Description

DELETE

Removes a data record from a table

DROP

Removes a table or database

INSERT

Adds a new data record to a table

SELECT

Retrieves data records from a table

UPDATE

Modifies data within an existing data record in a table

The most common SQL command you'll use in your web applications is the query. A query is a SQL SELECT statement that searches the database for specific data records. Here’s the basic format of a SELECT statement:

SELECT datafields FROM table

The datafields parameter is a comma-separated list of the data field names you want the query to return. If you want to retrieve all the data field values for the data records, you use an asterisk as a wildcard character.

You must also specify the specific table you want the query to search. To get meaningful results, you must match your query data fields with the proper table.

tip SQL keywords are often identified with all capital letters in a SQL statement. MySQL allows you to use either uppercase or lowercase for keywords. I use all capitals in this book to help you identify the keywords within the SQL statements.

By default, the SELECT statement returns all the data records in the specified table. You can use one or more modifiers to define how MySQL returns the data requested by the query. Table 1-3 shows the more popular modifiers you'll run into with SQL queries.

TABLE 1-3 SQL Query Modifiers

Modifier

Description

LIMIT

Displays only a subset of the returned data records

ORDER BY

Displays data records in a specified order

WHERE

Displays a subset of data records that meet a specified condition

The WHERE clause is the most common SELECT statement modifier. It allows you to specify conditions to filter data from the table. For example:

SELECT lastname FROM Employees WHERE salary > 100000;

This SELECT statement only returns the last name of the employees with a salary of over $100,000.

tip Having to use SQL to interact with a database server can seem a bit overwhelming at first — you have to learn an entirely new programming language besides the languages you're learning to build your dynamic web application. Don’t fret, though. There are really only a handful of SQL statements that you’ll regularly use during the course of your application development. You’ll start remembering them in no time.

Presenting MySQL

The specific relational database server that I discuss in this book is the MySQL database server. The MySQL server is the most popular database server used in web applications — and for good reason. The following sections describe the features of the MySQL server that make it so popular.

MySQL features

The MySQL database server was created by David Axmark, Allan Larsson, and Michael Widenius as an upgrade to the mSQL database server and was first released for general use in 1996. It’s now owned and supported by Oracle but released as open-source software.

MySQL was originally created to incorporate indexing data to speed up data queries in the mSQL database server, by using the indexed sequential access method (ISAM). It did this by incorporating a special data management algorithm called the MyISAM storage engine. This proved to be a huge success.

MySQL was initially recognized for its speed of accessing data. The MyISAM data storage and indexing method proved to be a game changer in speeding up data access from other types of DBMS packages. It wasn’t long before the Internet world took notice, and MySQL became the DBMS package of choice for high-volume web applications.

These days, MySQL has evolved to do more than just fast data queries. Development is continually ongoing to add new features to MySQL. A short list of features includes the following:

  • It was written in C and C++ and has been compiled to run on many different platforms.
  • It incorporates a modular design approach to create a multi-layer server design.
  • It supports multi-threading, making it easily scalable to incorporate multiple CPUs if available.
  • It uses a thread-based memory allocation system.
  • It implements hash tables in memory to increase performance.
  • It supports client/server and embedded server environments.
  • It supports multiple data storage engines.
  • It implements all SQL functions using a class library.
  • It includes support for all standard SQL data types.
  • It offers a security system that supports both user-based and host-based verification.
  • It includes support for large databases using more than 5 billion rows of data.
  • It provides application programming interfaces (APIs) for many common programming languages (including PHP).
  • It incorporates many different character sets, allowing it to support many different languages.
  • It provides both command line and graphical tools for common database management.

Of these features, let’s take a closer look at two specific features to demonstrate the versatility of MySQL. The following sections dive into the ability for MySQL to support different database storage types, as well as how MySQL handles user authentication.

Storage engines

As shown in the preceding section, the MySQL server uses a modular approach to building the database server. One of those modules is how it stores and accesses database data. This is called the storage engine.

The storage engine is the gatekeeper to your data and all requests to your data go through it. The MySQL server incorporates several different types of storage engines, shown in Table 1-4.

TABLE 1-4 The MySQL Storage Engines

Storage Engine

Description

Archive

Produces a special-purpose table for inserting and retrieving data, but not updating or deleting it.

Blackhole

Accepts data but does not store it. Used for development testing.

CSV

Stores data in a comma-separated file format.

Federated

Allows data access from a remote server without using replication.

Example

A storage engine that does nothing. Used as a template for storage engine developers.

InnoDB

An advanced storage engine that balances high reliability and high performance.

Memory

Stores all data in memory for fast performance, but it doesn’t retain the data.

MyISAM

The initial MySQL storage engine, known for being fast with few advanced features.

The MyISAM storage engine is what made MySQL famous, but it’s no longer being developed by Oracle. The default and recommended storage engine for MySQL is now the InnoDB storage engine.

The InnoDB storage engine supports many advanced database features found in commercial databases, but initially it was known for not being all that fast. Developers had to decide which was more important to their application: performance or fancy database features.

However, work has been done by the MySQL developers to increase the performance of the InnoDB storage engine so that it comes close to the performance of the MyISAM storage engine. This gives you the best of both worlds — advanced database features and a high-performance storage engine, all as open-source software!

Data permissions

The MySQL database server handles access to database data using a two-tiered approach:

  • The user account assigned to a user
  • The location from where the user connects to the server

MySQL considers your identity from both the user account you use to log into the system, as well as the host from which you connect. That means you can control access to your data not only to specific user accounts, but from where the users happen to be when they log into the database server. For example, you can give a user account full access to a database when she logs in from the local server but restricted read-only access to the database when she logs in from a remote server.

MySQL does this by using an access control list (ACL) to define permissions to databases, tables, and special features based on the identities. When you create an identity in MySQL, you not only create a user account, but also specify the location from which the access control applies. You can use wildcards to allow users to have the same permissions from multiple locations.

MySQL uses a two-stage approach to verifying your database connection. First, MySQL accepts or rejects the connection request based on the user ID/password combination provided and whether the account is locked on the system. Then, if the connection is granted, MySQL accepts or rejects the access request based on database and table permissions.

A user account can have access to the database server, but not every database on the server. You can create separate user accounts for each application database that you create on the MySQL server. If your application requires more control, you can even create separate user accounts that have access to only certain tables within the same database!

As the database administrator you also have the ability to grant system-level privileges to user accounts, such as the ability to create new databases or even new user accounts.

warning The MySQL server has a single main administration user account named root. If you forget the password to the root user account that may or may not be recoverable, depending on your server setup and environment. It’s always a good idea to keep track of the root user account’s password, but also to protect it so that no one else can use it. If your system requires multiple administrators, give them each a separate user account and grant those user accounts elevated privileges on the database server so they can create databases and user accounts as needed.

Advanced MySQL Features

When you use the default InnoDB storage engine in MySQL, you have a wealth of advanced database feature available for your applications to utilize. This section walks through the more advanced features that the InnoDB storage engine brings to the MySQL world.

Handling transactions

All database servers allow users to enter database commands to query and manipulate data. What separates good database servers from bad ones is the way they handle commands.

The database engine processes commands as a single unit, called a transaction. A transaction represents a single data operation on the database. Most simplistic database servers treat each command received — such as adding a new record to a table or modifying an existing record in a table — as a separate transaction. Groups of commands create groups of transactions.

However, some advanced database servers (such as the MySQL with the InnoDB storage engine) allow you to perform more complicated transactions. In some instances, it’s necessary for an application to perform multiple commands as a result of a single action.

remember In a relational database, tables can be related to one another. This means that one table can contain data that is related to the data in another table. In the store example, the Orders table relied on data in both the Customers and Products tables. Although this makes organizing data easier, it makes managing transactions more difficult. A single action may require the database server to update several data values in several different tables.

In the store example, if a new customer comes into the store and purchases a laptop computer, the database server must modify three tables:

  • Add a new data record to the Customers table
  • Add a new data record to the Orders table
  • Modify the Products table to subtract one from the laptop inventory value

For the action to be complete, all three of these actions must succeed. If any one of the actions fails, the data will become corrupt. In an advanced database server, you can combine all these actions into a single transaction. If any one of the actions fails, the database server rolls back the other two actions to return the database to the previous condition. This feature is crucial to have available for your web applications!

Making sure your database is ACID compliant

Over the years, database experts have devised rules for how databases should handle transactions. The benchmark for all professional database systems is the ACID test. No, we’re not throwing the server into an acid bath; the ACID test is actually an acronym for a set of database features defining how the database server should support transactions:

  • Atomicity
  • Consistency
  • Isolation
  • Durability

The following sections examine these four features and discuss how MySQL implements them.

Atomicity

The atomicity feature states that for a transaction to be considered successful, all steps within the transaction must complete successfully. Either all the steps should be applied to the database, or none of them should. A transaction should not be allowed to complete partway.

To support atomicity, MySQL uses a system called commit and rollback. Database actions are only temporarily performed during a transaction. When it appears that all the actions in a transaction would complete successfully, the transaction is committed (the server applies all the actions to the database). If it appears that any one of the actions would fail, the entire transaction is rolled back (any previous steps that were successful are reversed). This ensures that the transaction is completed as a whole.

MySQL uses the two-phase commit approach to committing transactions. The two-phase commit performs the transaction using two steps (or phases):

  • Prepare phase: A transaction is analyzed to determine if the database is able to commit the entire transaction.
  • Commit phase: The transaction is physically committed to the database.

The two-phase commit approach allows MySQL to test all transaction commands during the prepare phase without having to modify any data in the actual tables. Table data is not changed until the commit phase is complete.

Consistency

The concept of consistency is a little more difficult than atomicity. The consistency feature states that every transaction should leave the database in a valid state. The tricky part here is what is considered a “valid state.”

Often, this feature is applied to how a database server handles unexpected crashes. If the database takes a power hit in the middle of the commit phase of a multi-action transaction, can it leave the tables in a state where the data makes sense?

MySQL utilizes two features to accomplish consistency:

  • Double-write buffering: With double-write buffering, before MySQL writes data to the actual tables, it stores the data in a buffer area. Only after all the transaction data is written to the buffer area will MySQL write the buffer area data to the actual table data files.
  • Crash recovery: If there is a system crash before the buffer area is completely written to the table files, MySQL can recover the buffer area using the crash recovery feature, which recovers submitted transactions from a transaction log file.

Isolation

The isolation feature is required for multiuser databases. When there is more than one person modifying data in a database, odd things can happen. If two people try to modify the same data value at the same time, who’s to say which value is the final value?

When more than one person tries to access the same data, the DBMS must act as the traffic cop, directing who gets access to the data first. Isolation ensures that each transaction in progress is invisible to any other transaction in progress. The DBMS must allow each transaction to complete and then decide which transaction value is the final value for the data. It accomplishes this task using a feature called locking.

Locking does what it says: It locks data while a transaction is being committed to the database. While the data is locked, other users can’t access the data, not even for queries. This prevents multiple users from querying or modifying the data while it’s in a locked mode.

There are two basic levels of locking that MySQL uses to support isolation:

  • Table-level locking: With table-level locking, any time a user requires a modification to a data record in a table, the DBMS locks the entire table, preventing other users from even viewing data in the table. As you can guess, this has an adverse effect on database performance, especially in environments where there is a lot of change to the data in the database. Early DBMS implementations used table-level locking exclusively.
  • Row-level locking: To solve the problems of table-level locking, many DBMS implementations (including the MySQL InnoDB storage engine) now incorporate row-level locking. With row-level locking, the DBMS locks only the data record that’s being modified. The rest of the table is available for other users to access.

Durability

The durability feature states that when a transaction is committed to the database, it must not be lost. This sounds like a simple concept, but in reality durability is often harder to ensure than it sounds.

Durability means being able to withstand both hardware and software failures. A database is useless if a power outage or server crash compromises the data stored in the database.

MySQL supports durability by incorporating multiple layers of protection. The same double-write buffer and crash recovery features mentioned for the consistency feature also apply to the durability feature. MySQL writes all transactions to a log file, writes the changes to the double-write buffer area, and then writes them to the actual database files. If the system crashes during this process, most of the time MySQL can recover the transaction within the process.

tip The onus of durability also rests on the database administrator. Having a good uninterruptable power supply (UPS) for your database server, as well as performing regular database backups, is crucial to ensuring your database tables are safe.

Examining the views

The SQL programming language provides developers with the ability to create some pretty complex queries, retrieving data from multiple tables in a single SQL statement. However, for queries that span more than a couple of tables, the SQL statement can become overly complex.

To help simplify complex query statements, some DBMS packages (including MySQL) allow administrators to create views. A view allows you to see (or view) data contained in separate database tables as if it were in a single table. Instead of having to write a sub-select query to grab data from multiple places, all the data is available in a single table view.

To a query, the view looks like any other database table. The DBMS can query views just like normal tables. A view does not use any disk space in the database files, because the DBMS generates the data in the view “on the fly” when a query tries to access the data. When the query is complete, the view data disappears. Figure 1-5 shows a sample view that you could create from the store database example.

image

FIGURE 1-5: A view of customer order information.

The view shown in Figure 1-5 incorporates some of the customer data from the Customers table, product data from the Products table, and order data from the Orders table. Queries can access all the fields in the view as if they belonged to a single table.

warning You can always use a view to read data, but you may or may not be able to use the view to insert new data or update existing data. It depends on the relationship between the data fields in the view. Data fields related in a one-to-one relationship can be inserted or updated, but data fields related in a one-to-many relationship can’t.

Working with stored procedures

A stored procedure is a set of SQL statements that are commonly used by applications. Instead of each application needing to submit the multiple SQL statements, you can create a stored procedure that contains the SQL statements and each application just needs to run the stored procedure.

Stored procedures can also help with the performance of the application, because less information needs to be sent between the client and the server (especially for long procedures). Stored procedures also allows you to create your own library of common functions in the database server to share among multiple applications. This helps you performance-tune queries and ensure all the applications use the same procedure to retrieve the data.

Pulling triggers

A trigger is a set of instructions that the DBMS performs on data based on an event in the table that contains the data. Events that can trigger the instructions are inserts, updates, or deletions of data contained in one or more tables. Here are the most common triggers you’ll see:

  • AFTER DELETE: Perform the set of instructions after a data record has been deleted from the table.
  • BEFORE DELETE: Perform the set of instructions before a data record is deleted from the table.
  • AFTER INSERT: Perform the set of instructions after a data record has been inserted into the table.
  • BEFORE INSERT: Perform the set of instructions before a data record is inserted into a table.
  • AFTER UPDATE: Perform the set of instructions after a data record is updated in the table.
  • BEFORE UPDATE: Perform the set of instructions before a data record is updated in the table.

Triggers help you maintain data integrity within your database tables by monitoring when data is changed and having the ability to change related data at the same time.

Working with blobs

Most database users are familiar with the common data types that you can store in a database. These include integers, floating point numbers, Boolean values, fixed-length character strings, and variable-length character strings. However, in the modern programming world, support for lots of other data types is necessary. It’s not uncommon to see web applications that are used to store and index pictures, audio clips, and even short videos. This type of data storage has forced many professional databases to devise a plan to store different types of data.

MySQL uses a special data type called the binary large object (BLOB) to store any type of binary data. You can enter a BLOB into a table the same as any other data type. This allows you to include support for any type of multimedia storage within applications and still use all the fast retrieval and indexing methods of the database.

warning Just because you can save large binary files in your tables doesn’t mean that it’s necessarily a good idea to do it. Large binary files can quickly fill a database disk space and slow down normal database queries. You’ll need to analyze your particular application requirements to determine if it’s better to store binary data inside the database or store the binary data outside as standard files, with just a pointer to the filename stored in the database.