Now that we have an autoloader in place, we can begin to remove all the include
calls that only load up class and function definitions. When we are done, the only remaining include
calls will be those that are executing logic. This will make it easier to see which include
calls are forming the logic paths in our legacy application, and which are merely providing definitions.
We will start with a scenario where the codebase is structured relatively well. Afterwards, we will answer some questions related to layouts that are not so amenable to revision.
First, we will consolidate all the application classes to our central directory location as determined in the previous chapter. Doing so will put them where our autoloader can find them. Here is the general process we will follow:
include
statement that pulls in a class definition file.include
pulls in that class definition, remove that include
statement.include
calls that pull in class definitions.For our examples, we will assume we have a legacy application with this partial file system layout:
/path/to/app/
classes/ # our central class directory location Mlaphp/ Autoloader.php # A hypothetical autoloader class foo/ bar/ baz.php # a page script includes/ # a common "includes" directory setup.php # setup code index.php # a page script lib/ # a directory with some classes in it sub/ Auth.php # class Auth { ... } Role.php # class Role { ... } User.php # class User { ... }
Your own legacy application may not match this exactly, but you get the idea.
We begin by picking a file, any file, then we examine it for include
calls. The code therein might look like this:
1 <?php 2 require 'includes/setup.php'; 3 require_once 'lib/sub/User.php'; 4 5 // ... 6 $user = new User(); 7 // ... 8 ?>
We can see that there is a new User
class being instantiated. On inspecting the lib/sub/User.php
file, we can see it is the only class defined therein.
Having identified an include
statement that loads a class definition, we now move that class definition file to the central class directory location so that our autoloader function can find it. The resulting file system layout now looks like this (note that User.php
is now in classes/
):
/path/to/app/
classes/ # our central class directory location
Mlaphp/ Autoloader.php # A hypothetical autoloader class
User.php # class User { ... }
foo/ bar/ baz.php # a page script
includes/ # a common "includes" directory
setup.php # setup code
db_functions.php # a function definition file
index.php # a page script
lib/ # a directory with some classes in it
sub/
Auth.php # class Auth { ... }
Role.php # class Role { ... } ~~
Now the problem is that our original file is trying to include
the class file from its old location, a location that no longer exists. We need to remove that call from the code:
index.php
1 <?php
2 require 'includes/setup.php';
3
4 // ...
5 // the User class is now autoloaded
6 $user = new User();
7 // ...
8 ?>
However, there are likely to be other places where the code attempts to load the now-missing lib/sub/User.php
file.
This is where a project-wide search facility comes in handy. We have different options here, depending on your editor/IDE of choice and operating system.
grep
at the command line to search all the files in a particular directory and its subdirectories.The point is to find all the include
calls that refer to lib/sub/User.php
. Because the include
calls can be formed in different ways, we need to use a regular expression like this to search for the include
calls:
^[ \t]*(include|include_once|require|require_once).*User\.php
If you are not familiar with regular expressions, here is a breakdown of what we are looking for:
^ Starting at the beginning of each line, [ \t]* followed by zero or more spaces and/or tabs, (include|...) followed by any of these words, .* followed by any characters at all, User\.php followed by User.php, and we don't care what comes after.
(Regular expressions use .
to mean any character
so we have to specify User\.php
to indicate we mean a literal dot, not any character.)
If we use a regular expression search to find those strings in the legacy codebase, we will be presented with a list of all matching lines and their corresponding files. Unfortunately, it is up to us to examine each line to see if it really is a reference to the lib/sub/User.php
file. For example, this line might turn up in the search results:
include_once("/usr/local/php/lib/User.php");
However, clearly it is not the User.php
file we are looking for.
We could be more strict with our regular expression so that we search specifically for lib/sub/User.php
but that is more likely to miss some include
calls, especially those in files under the lib/
or sub/
directories. For example, an include
in a file in sub/
could look like this:
include 'User.php';
As such, it's better to be a little loose with the search to get every possible candidate, then work through the results manually.
Examine each search result line, and if it is an include
that pulls in the User
class, remove it and save the file. Keep a list of each modified file, as we will need to test them later.
At the end of this, we will have removed all the include
calls for that class throughout the codebase.
After removing the include
statements for the given class, we now need to make sure the application works. Unfortunately, because we have no testing process in place, this means we need to pseudo-test or spot check by browsing to or otherwise invoking the modified files. In practice this is generally not difficult, but it is tedious.
When we spot check we are looking specifically for file not found and class not defined errors. These mean, respectively, that a file tried to include
the missing class file, or that the autoloader failed to find the class file.
To do the testing we need to set PHP error reporting so that it either shows us the errors directly, or logs the errors to a file that we examine while testing the codebase. In addition, the error reporting level needs to be sufficiently strict that we actually see the errors. In general, error_reporting(E_ALL)
is what we want, but because this is a legacy codebase, it may show more errors than we can bear (especially variable not defined notices). As such, it may be more productive to set error_reporting(E_WARNING)
. The error reporting values can be set either in a setup or bootstrap file, or in the correct php.ini
file.
After the testing is complete and all errors have been fixed, commit the code to source control and (if needed) push it to the central code repository. If you have a QA team, now would be the time to notify them that a new testing round is needed, and provide them the list of files to test.
That is the process to convert a single class from include
to autoloading. Go back through the codebase and find the next include
that pulls in a class file and begin the process again. Continue doing so until all classes have been consolidated into the central class directory location and their relevant include
lines have been removed. Yes, this is a tedious, tiresome, and time-consuming process, but it is a necessary step towards modernizing our legacy codebase.