Chapter 1
IN THIS CHAPTER
Understanding why you need a database
Seeing how MySQL works
Exploring the advanced features of MySQL
Computers are all about storing information. However, unlike that junk drawer in your kitchen that contains multiple shards of paper with names and phone numbers scribbled on them, you want to store your dynamic web application data in an orderly fashion. After all, you wouldn’t want to mix up the data from your astrophysics experiments with your bowling league scores!
The MySQL database server provides a user-friendly platform for you to organize your application data, making it simple to identify which data belongs to which application and easy for the application to access the data, all while maintaining security so the right people can only get to the right data. This chapter describes just why you need a database for your dynamic web applications, and why you should choose the MySQL database server.
With PHP, you have a few different options for storing persistent data in your application to retrieve at a later time. One method is to use the PHP file system functions to create a standard text file on the server to store the application data, and then read the data back as necessary.
One downside to using standard text files to store your application data is that it’s hard to find a specific data item buried in the text file. Standard text files are often called “flat files” because you can’t create any type of relationships in the data to make searching for specific information easier. Your application must open the text file and read each line until it finds the data it needs. That’s fine for small amounts of data, but for large amounts of data that can be slow, especially if there are thousands of site visitors all trying to access their data from the same file at the same time.
To solve that problem, most web developers have turned to using databases. Databases organize data in a manner making it easier for the database server to insert, find, modify, and delete data. There are lots of different database types available, but one of the most popular is the relational database system. This section describes how relational databases work with data to help speed up your web application.
Microsoft Access is by far the most popular end-user database tool developed for commercial use. Many Windows users, from professional accounts to bowling league secretaries, use Access to track data. It provides an easy, intuitive user interface, allowing novice computer users to quickly produce queries and reports with little effort.
However, despite its user-friendliness, Access has its limitations. To fully understand how MySQL differs from Access, you must first understand how database systems are organized.
There is more to a database than just a bunch of data files. Most databases incorporate several layers of files, programs, and utilities, which all interact to provide the database experience. The whole package is referred to as a database management system (DBMS).
There are different types of DBMS packages, but they all basically contain the following parts:
The database engine is the heart and brains of the DBMS. It controls all access to the data, which is stored in the database files. Any application (including the DBMS itself) that requires access to data must go through the database engine (see Figure 1-1).
FIGURE 1-1: A simple database management system.
The database engine uses an internal data dictionary to define how the database operates, the type of data that can be stored in the database files, and the structure of the database. It basically defines the rules used for the DBMS. Each DBMS has its own data dictionary.
If you’re a user running a simple database on Access, you probably don’t even realize you’re using a database engine. Access keeps much of the DBMS work under the hood and away from users. When you start Access, the database engine starts, and when you stop Access, the database engine stops.
In MySQL, the database engine runs as a service that is always running in the background on the server. Users run separate application programs that interface with the database engine while it’s running. Each application can send queries to the database engine and process the results returned. When the application stops, the MySQL database engine continues to run, waiting for commands from other applications.
Both Access and MySQL require one or more database files to be present to hold data. If you work with Access, you’ve seen the .mdb
database files. These files contain the data defined in tables created in the Access database. Each database has its own .mdb
file.
In the Access environment, if two or more applications want to share a database, the database file must be located on a shared network drive available to all the applications. Each application has a copy of the Access database engine program running on the local workstation, which points to the common database file, as shown in Figure 1-2.
FIGURE 1-2: A shared Microsoft Access environment.
Where this model falls apart is that there are multiple database engines, all trying to access the database files across a network environment. This generates large amounts of data on the network and slows down the performance of the individual database engines.
In the MySQL model, the database engine and database files are always on the same computer. Queries and reports run from separate applications, but they all send requests to the common database engine, as shown in Figure 1-3.
FIGURE 1-3: A multiuser MySQL environment.
As you can see from Figure 1-3, the MySQL database engine accepts data requests from multiple users across the network. All the database access is performed on the local system running the MySQL server, so the data interaction with the database files stays on the local system. The database engine only sends the query or report results across the network to the applications.
This feature alone makes using MySQL a better database choice in multiuser database projects.
Databases are all about arranging data to make finding information faster. Relational database theory arranges data in three levels: databases, tables, and data fields.
A database groups related data into a single container. The database is the highest level or grouping of data on the relational database server. The server allows you to create multiple databases, all accessible from the same server service running on the server.
The table is a subset of data within the database, which contains a grouping of similar data items. For example, if a company wants to track data on employees, customers, and products, instead of having just one group of mixed-up data elements, the company would create four separate tables to hold the data:
The process of grouping application data into tables is called data normalization. Grouping similar data into its own table gives you more control over the data. For example, if you have a program that interfaces only with customer orders, you can give it permissions to only the Customers, Products, and Orders tables, leaving the Employees table safe from accidental exposure.
You use data fields to hold individual data elements within a table. For example, the Employees table might contain data fields for an employee ID number, first name, last name, home address, salary, and employment start date. The data fields are the core of the application because they’re where the application actually stores data.
The table groups data fields into data records. Each data record is a single occurrence of values for each of the data fields. Figure 1-4 shows a diagram of how the Employees table might look.
FIGURE 1-4: An example of an Employees table layout.
Figure 1-4 shows the data fields as table headings. Each data record appears as a single line of data in the table (in this case, the information for a single employee). Because data is often displayed this way in a table, you’ll often hear the word row used to reference a single data record.
Just as with variables in programming languages, databases need to identify the type of data stored in each data field so that it knows how much space to reserve to store the data, and how to handle the data. Table 1-1 shows the basic data types found in most relational database systems.
TABLE 1-1 Standard Database Data Types
Data Type |
Description |
|
A whole number between –2,147,483,648 and 2,147,483,647 |
|
A floating point number between –3.40283466E+38 and +3.40283466E+38 |
|
A Boolean true or false value |
|
A day value in the YYYY-MM-DD format |
|
A day and time value displayed in YYYY-MM-DD HH:MM:SS format |
|
A fixed-length character string with x characters |
|
A variable-length character string with x or fewer characters |
|
A variable-length character string of up to 65,536 characters stored as a binary value |
Many relational database servers provide variations of these standard data types, such as small integer values or large text values, to help you customize exactly how much space to reserve for each data field. Unfortunately, these customized data types aren't necessarily standardized between relational databases.
Besides the data field name and value, a data field can be marked with special data constraints. Relational databases use data constraints to control how you place data into a data field. The most popular data constraint is the primary key.
A primary key defines the table data field(s) that uniquely identify each individual data record in the table. For example, if you’re retrieving an employee data record and your company has two employees named John Smith, you’ll run into a problem trying to get the correct data for the correct employee. To solve this problem, relational databases allow you to add a special data field to tell you which John Smith each data record refers to.
To do this, you must create an employee ID data field and assign a unique ID number to each employee. Because the new employee ID data field uniquely identifies each employee record, you can specify it as the primary key for the Employees table. The database server creates a separate hidden table relating the primary key values to data record numbers, and then uses it as an index to quickly retrieve the correct data record based on the primary key value.
Another popular data constraint you’ll run across is the is null
restriction. If you set a data field with the is null
data constraint, the database server will prevent you from entering a data record without a value in that data field.
The Structured Query Language (SQL) is a language for interacting with relational database systems that been around since the early 1970s. Over the years, other database vendors have tried to mimic or replace SQL with their own query languages. But despite their attempts, SQL still provides the easiest interface for both users and administrators to interact with any type of relational database system.
In 1986, the American National Standards Institute (ANSI) created the first SQL standards. The U.S. government adopted them as a federal standard and named it ANSI SQL89. Most commercial database vendors now use this SQL standard to interface with their products.
The SQL language specifies a format that you use to send commands to the database server. The SQL command format consists of:
TABLE 1-2 SQL Keywords
Keyword |
Description |
|
Removes a data record from a table |
|
Removes a table or database |
|
Adds a new data record to a table |
|
Retrieves data records from a table |
|
Modifies data within an existing data record in a table |
The most common SQL command you'll use in your web applications is the query. A query is a SQL SELECT
statement that searches the database for specific data records. Here’s the basic format of a SELECT
statement:
SELECT datafields FROM table
The datafields
parameter is a comma-separated list of the data field names you want the query to return. If you want to retrieve all the data field values for the data records, you use an asterisk as a wildcard character.
You must also specify the specific table
you want the query to search. To get meaningful results, you must match your query data fields with the proper table.
By default, the SELECT
statement returns all the data records in the specified table. You can use one or more modifiers to define how MySQL returns the data requested by the query. Table 1-3 shows the more popular modifiers you'll run into with SQL queries.
TABLE 1-3 SQL Query Modifiers
Modifier |
Description |
|
Displays only a subset of the returned data records |
|
Displays data records in a specified order |
|
Displays a subset of data records that meet a specified condition |
The WHERE
clause is the most common SELECT
statement modifier. It allows you to specify conditions to filter data from the table. For example:
SELECT lastname FROM Employees WHERE salary > 100000;
This SELECT
statement only returns the last name of the employees with a salary of over $100,000.
The specific relational database server that I discuss in this book is the MySQL database server. The MySQL server is the most popular database server used in web applications — and for good reason. The following sections describe the features of the MySQL server that make it so popular.
The MySQL database server was created by David Axmark, Allan Larsson, and Michael Widenius as an upgrade to the mSQL database server and was first released for general use in 1996. It’s now owned and supported by Oracle but released as open-source software.
MySQL was originally created to incorporate indexing data to speed up data queries in the mSQL database server, by using the indexed sequential access method (ISAM). It did this by incorporating a special data management algorithm called the MyISAM storage engine. This proved to be a huge success.
MySQL was initially recognized for its speed of accessing data. The MyISAM data storage and indexing method proved to be a game changer in speeding up data access from other types of DBMS packages. It wasn’t long before the Internet world took notice, and MySQL became the DBMS package of choice for high-volume web applications.
These days, MySQL has evolved to do more than just fast data queries. Development is continually ongoing to add new features to MySQL. A short list of features includes the following:
Of these features, let’s take a closer look at two specific features to demonstrate the versatility of MySQL. The following sections dive into the ability for MySQL to support different database storage types, as well as how MySQL handles user authentication.
As shown in the preceding section, the MySQL server uses a modular approach to building the database server. One of those modules is how it stores and accesses database data. This is called the storage engine.
The storage engine is the gatekeeper to your data and all requests to your data go through it. The MySQL server incorporates several different types of storage engines, shown in Table 1-4.
TABLE 1-4 The MySQL Storage Engines
Storage Engine |
Description |
Archive |
Produces a special-purpose table for inserting and retrieving data, but not updating or deleting it. |
Blackhole |
Accepts data but does not store it. Used for development testing. |
CSV |
Stores data in a comma-separated file format. |
Federated |
Allows data access from a remote server without using replication. |
Example |
A storage engine that does nothing. Used as a template for storage engine developers. |
InnoDB |
An advanced storage engine that balances high reliability and high performance. |
Memory |
Stores all data in memory for fast performance, but it doesn’t retain the data. |
MyISAM |
The initial MySQL storage engine, known for being fast with few advanced features. |
The MyISAM storage engine is what made MySQL famous, but it’s no longer being developed by Oracle. The default and recommended storage engine for MySQL is now the InnoDB storage engine.
The InnoDB storage engine supports many advanced database features found in commercial databases, but initially it was known for not being all that fast. Developers had to decide which was more important to their application: performance or fancy database features.
However, work has been done by the MySQL developers to increase the performance of the InnoDB storage engine so that it comes close to the performance of the MyISAM storage engine. This gives you the best of both worlds — advanced database features and a high-performance storage engine, all as open-source software!
The MySQL database server handles access to database data using a two-tiered approach:
MySQL considers your identity from both the user account you use to log into the system, as well as the host from which you connect. That means you can control access to your data not only to specific user accounts, but from where the users happen to be when they log into the database server. For example, you can give a user account full access to a database when she logs in from the local server but restricted read-only access to the database when she logs in from a remote server.
MySQL does this by using an access control list (ACL) to define permissions to databases, tables, and special features based on the identities. When you create an identity in MySQL, you not only create a user account, but also specify the location from which the access control applies. You can use wildcards to allow users to have the same permissions from multiple locations.
MySQL uses a two-stage approach to verifying your database connection. First, MySQL accepts or rejects the connection request based on the user ID/password combination provided and whether the account is locked on the system. Then, if the connection is granted, MySQL accepts or rejects the access request based on database and table permissions.
A user account can have access to the database server, but not every database on the server. You can create separate user accounts for each application database that you create on the MySQL server. If your application requires more control, you can even create separate user accounts that have access to only certain tables within the same database!
As the database administrator you also have the ability to grant system-level privileges to user accounts, such as the ability to create new databases or even new user accounts.
When you use the default InnoDB storage engine in MySQL, you have a wealth of advanced database feature available for your applications to utilize. This section walks through the more advanced features that the InnoDB storage engine brings to the MySQL world.
All database servers allow users to enter database commands to query and manipulate data. What separates good database servers from bad ones is the way they handle commands.
The database engine processes commands as a single unit, called a transaction. A transaction represents a single data operation on the database. Most simplistic database servers treat each command received — such as adding a new record to a table or modifying an existing record in a table — as a separate transaction. Groups of commands create groups of transactions.
However, some advanced database servers (such as the MySQL with the InnoDB storage engine) allow you to perform more complicated transactions. In some instances, it’s necessary for an application to perform multiple commands as a result of a single action.
In the store example, if a new customer comes into the store and purchases a laptop computer, the database server must modify three tables:
For the action to be complete, all three of these actions must succeed. If any one of the actions fails, the data will become corrupt. In an advanced database server, you can combine all these actions into a single transaction. If any one of the actions fails, the database server rolls back the other two actions to return the database to the previous condition. This feature is crucial to have available for your web applications!
Over the years, database experts have devised rules for how databases should handle transactions. The benchmark for all professional database systems is the ACID test. No, we’re not throwing the server into an acid bath; the ACID test is actually an acronym for a set of database features defining how the database server should support transactions:
The following sections examine these four features and discuss how MySQL implements them.
The atomicity feature states that for a transaction to be considered successful, all steps within the transaction must complete successfully. Either all the steps should be applied to the database, or none of them should. A transaction should not be allowed to complete partway.
To support atomicity, MySQL uses a system called commit and rollback. Database actions are only temporarily performed during a transaction. When it appears that all the actions in a transaction would complete successfully, the transaction is committed (the server applies all the actions to the database). If it appears that any one of the actions would fail, the entire transaction is rolled back (any previous steps that were successful are reversed). This ensures that the transaction is completed as a whole.
MySQL uses the two-phase commit approach to committing transactions. The two-phase commit performs the transaction using two steps (or phases):
The two-phase commit approach allows MySQL to test all transaction commands during the prepare phase without having to modify any data in the actual tables. Table data is not changed until the commit phase is complete.
The concept of consistency is a little more difficult than atomicity. The consistency feature states that every transaction should leave the database in a valid state. The tricky part here is what is considered a “valid state.”
Often, this feature is applied to how a database server handles unexpected crashes. If the database takes a power hit in the middle of the commit phase of a multi-action transaction, can it leave the tables in a state where the data makes sense?
MySQL utilizes two features to accomplish consistency:
The isolation feature is required for multiuser databases. When there is more than one person modifying data in a database, odd things can happen. If two people try to modify the same data value at the same time, who’s to say which value is the final value?
When more than one person tries to access the same data, the DBMS must act as the traffic cop, directing who gets access to the data first. Isolation ensures that each transaction in progress is invisible to any other transaction in progress. The DBMS must allow each transaction to complete and then decide which transaction value is the final value for the data. It accomplishes this task using a feature called locking.
Locking does what it says: It locks data while a transaction is being committed to the database. While the data is locked, other users can’t access the data, not even for queries. This prevents multiple users from querying or modifying the data while it’s in a locked mode.
There are two basic levels of locking that MySQL uses to support isolation:
The durability feature states that when a transaction is committed to the database, it must not be lost. This sounds like a simple concept, but in reality durability is often harder to ensure than it sounds.
Durability means being able to withstand both hardware and software failures. A database is useless if a power outage or server crash compromises the data stored in the database.
MySQL supports durability by incorporating multiple layers of protection. The same double-write buffer and crash recovery features mentioned for the consistency feature also apply to the durability feature. MySQL writes all transactions to a log file, writes the changes to the double-write buffer area, and then writes them to the actual database files. If the system crashes during this process, most of the time MySQL can recover the transaction within the process.
The SQL programming language provides developers with the ability to create some pretty complex queries, retrieving data from multiple tables in a single SQL statement. However, for queries that span more than a couple of tables, the SQL statement can become overly complex.
To help simplify complex query statements, some DBMS packages (including MySQL) allow administrators to create views. A view allows you to see (or view) data contained in separate database tables as if it were in a single table. Instead of having to write a sub-select query to grab data from multiple places, all the data is available in a single table view.
To a query, the view looks like any other database table. The DBMS can query views just like normal tables. A view does not use any disk space in the database files, because the DBMS generates the data in the view “on the fly” when a query tries to access the data. When the query is complete, the view data disappears. Figure 1-5 shows a sample view that you could create from the store database example.
FIGURE 1-5: A view of customer order information.
The view shown in Figure 1-5 incorporates some of the customer data from the Customers table, product data from the Products table, and order data from the Orders table. Queries can access all the fields in the view as if they belonged to a single table.
A stored procedure is a set of SQL statements that are commonly used by applications. Instead of each application needing to submit the multiple SQL statements, you can create a stored procedure that contains the SQL statements and each application just needs to run the stored procedure.
Stored procedures can also help with the performance of the application, because less information needs to be sent between the client and the server (especially for long procedures). Stored procedures also allows you to create your own library of common functions in the database server to share among multiple applications. This helps you performance-tune queries and ensure all the applications use the same procedure to retrieve the data.
A trigger is a set of instructions that the DBMS performs on data based on an event in the table that contains the data. Events that can trigger the instructions are inserts, updates, or deletions of data contained in one or more tables. Here are the most common triggers you’ll see:
AFTER DELETE
: Perform the set of instructions after a data record has been deleted from the table.BEFORE DELETE
: Perform the set of instructions before a data record is deleted from the table.AFTER INSERT
: Perform the set of instructions after a data record has been inserted into the table.BEFORE INSERT
: Perform the set of instructions before a data record is inserted into a table.AFTER UPDATE
: Perform the set of instructions after a data record is updated in the table.BEFORE UPDATE
: Perform the set of instructions before a data record is updated in the table.Triggers help you maintain data integrity within your database tables by monitoring when data is changed and having the ability to change related data at the same time.
Most database users are familiar with the common data types that you can store in a database. These include integers, floating point numbers, Boolean values, fixed-length character strings, and variable-length character strings. However, in the modern programming world, support for lots of other data types is necessary. It’s not uncommon to see web applications that are used to store and index pictures, audio clips, and even short videos. This type of data storage has forced many professional databases to devise a plan to store different types of data.
MySQL uses a special data type called the binary large object (BLOB) to store any type of binary data. You can enter a BLOB into a table the same as any other data type. This allows you to include support for any type of multimedia storage within applications and still use all the fast retrieval and indexing methods of the database.