If we placed our autoloader code in a class as a static or instance method, our search for include
calls will reveal the inclusion of that class file. If you remove that include
call, autoloading will fail, because the class file will not have been loaded. This is a chicken-and-egg problem. The solution is to leave the autoloader include
in place as part of our bootstrapping or setup code. If we are fully diligent about removing include
calls, that is likely to be the only include
remaining in the codebase.
There are several ways to go about this. We could do the following:
include
those files.include
call and look at the related file to see if it has class or function definitions.Sometime a class definition file may have more than one class definition in it. This can mess with the autoloading process. If a file named Foo.php
defines both Foo
and Bar
classes, then the Bar
class will never be autoloaded, because the file name is wrong.
The solution is to split the single file into multiple files. That is, create one file per class, and name each file for the class it contains per the PSR-0 naming and autoloading expectations.
I sometimes hear complaints about how the one-class-per-file rules is somehow wasteful or otherwise not aesthetically pleasing when examining the file system. Isn't it a drag on performance to load that many files? What if some classes are only needed along with some other class, such as an Exception
that is only used in one place? I have some responses here:
If we feel that some classes naturally belong together, it is perfectly acceptable to place the subordinate or child classes in a subdirectory beneath the master or parent class. The subdirectory should be named for that higher class or namespace, per the PSR-0 naming rules.
For example, if we have a series of Exception
classes related to a Foo
class:
Foo.php # class Foo { ... } Foo/ NotFoundException.php # class Foo_NotFoundException { ... } MalformedDataException.php # class Foo_MalformedDataException { ... }
Renaming classes in this way will change the related class names throughout the codebase where they are instantiated or otherwise referenced.
I have seen cases where a page script has one or more classes or functions defined inside it, generally when the classes or functions are used only by that particular page script.
In these cases, remove the class definitions from the script and place them in their own files in the central class directory location. Be sure to name the files for their class names per the PSR-0 autoloader rules. Similarly, move the function definitions to their own related class file as static methods, and rename the function calls to static method calls.
I have also seen the opposite case, where a class file has some logic that gets executed as a result of the file being loaded. For example, a class definition file might look like this:
/path/to/foo.php
1 <?php
2 echo "Doing something here ...";
3 log_to_file('a log entry');
4 db_query('UPDATE table_name SET incrementor = incrementor + 1');
5
6 class Foo
7 {
8 // the class
9 }
10 ?>
In the above case, the logic before the class definition will be executed when the file is loaded, even if the class is never instantiated or otherwise called.
This is a much tougher situation to deal with than when classes are defined inline with a page script. The class should be loadable without side effects, and the other logic should be executable without having to load the class.
In general, the easiest way to deal with this is to modify our relocation process. Cut the class definition from the original file and place it in its own file in the central class directory location. Leave the original file with its executable code in place, and leave all the related include
calls in place as well. This allows us to pull out the class definition so it can be autoloaded, but scripts that include
the original file still get the executable behavior.
For example, given the above combined executable code and class definition, we could end up with these two files:
/path/to/foo.php
1 <?php
2 echo "Doing something here ...";
3 log_to_file('a log entry');
4 db_query('UPDATE table_name SET incrementor = incrementor + 1');
5 ?>
/path/to/app/classes/Foo.php
1 <?php
2 class Foo
3 {
4 // the class
5 }
6 ?>
This is messy, but it preserves the existing application behavior while allowing for autoloading.
When we start moving classes around, we may discover that application flow A
uses a Foo
class, and that application flow B
also uses a Foo
class, but the two classes of the same name are actually different classes defined in different files. They never conflict with each other because the two different application flows never intersect.
In this case, we have to rename one or both of the classes when we move them to our central class directory location. For example, call one of them FooOne
and the other FooTwo
, or pick better descriptive names of your own. Place them each in separate class files named for their class names, per the PSR-0 autoloading rules, and rename all references to these classes throughout the codebase.
When we consolidate our classes and functions, we may find some third-party libraries in the legacy application. We don't want to move or rename the classes and functions in a third-party library, because that would make it too difficult to upgrade the library later. We would have to remember what classes were moved where and which functions were renamed to what.
With any luck, the third-party library uses autoloading of some sort already. If it comes with its own autoloader, we can add that autoloader to the SPL autoloader registry stack in our setup or bootstrap code. If its autoloading is managed by another autoloader system, such as that found in Composer, we can add that autoloader to the SPL autoloader registry stack, again in our setup or bootstrap code.
If the third-party library does not use autoloading, and depends on include
calls both in its own code and in the legacy application, we are in a bit of a bind. We don't want to modify the code in the library, but at the same time we want to remove include
calls from the legacy application. The two solutions here are least-worst options:
Both of these options are beyond the scope of this book. You will need to examine the library in question, determine its class naming scheme, and come up with appropriate autoloader code on your own.
Finally, in terms of how to organize third-party libraries in the legacy application, it might be wise to consolidate them all to their own central location in the codebase. For example, this might be under a directory called 3rdparty/
or external_libs/
. If we move a library, we should move the entire package, not just its class files, so we can upgrade it properly later. This will also allow us to exclude the central third-party directory from our search for include
calls so that we don't get extra search results from files that we don't want to modify.
System-wide library collections, like those provided by Horde and PEAR, are a special case of third-party libraries. They are generally located on the server file system outside of the legacy application so they can be available to all applications running on that server. The include
statements related to these system-wide libraries generally depend on the include_path
settings, or else are referenced by absolute path.
These present a special problem when trying to eliminate include
calls that only pull in class and function definitions. If we are lucky enough to be using PEAR-installed libraries, we can modify our existing autoloader to look in two directories instead of one. This is because the PSR-0 naming conventions rise out of the Horde/PEAR conventions. The trailing autoloader code changes from this:
1 <?php 2 // convert underscores in the class name to directory separators 3 $subpath .= str_replace('_', DIRECTORY_SEPARATOR, $class); 4 5 // the path to our central class directory location 6 $dir = '/path/to/app/classes' 7 8 // prefix with the central directory location and suffix with .php, 9 // then require it. 10 require $dir . DIRECTORY_SEPARATOR . $subpath . '.php'; 11 ?>
To this:
1 <?php 2 // convert underscores in the class name to directory separators 3 $subpath .= str_replace('_', DIRECTORY_SEPARATOR, $class); 4 5 // the paths to our central class directory location and to PEAR 6 $dirs = array('/path/to/app/classes', '/usr/local/pear/php'); 7 foreach ($dirs as $dir) { 8 $file = $dir . DIRECTORY_SEPARATOR . $subpath . '.php'; 9 if (file_exists($file)) { 10 require $file; 11 } 12 } 13 ?>
When we consolidated user-defined global functions into classes, we redefined them as static methods. This left their global scope unchanged. If we feel particularly diligent, we can change them from static to instance methods. This involves more work, but in the end it can make testing easier and is a cleaner technical approach. Given our earlier Db
example, using instance instead of static methods would look like this:
classes/Db.php
1 <?php
2 class Db
3 {
4 public function query($query_string)
5 {
6 // ... code to perform a query ...
7 }
8
9 public function getRow($query_string)
10 {
11 // ... code to get the first result row
12 }
13
14 public function getCol($query_string)
15 {
16 // ... code to get the first column of results ...
17 }
18 }
19 ?>
The only added step when using instance methods instead of static ones is that we need to instantiate the class before calling its methods. That is, instead of this:
1 <?php 2 Db::query(...); 3 ?>
We would do this:
1 <?php 2 $db = new Db(); 3 $db->query(...); 4 ?>
Even though it is more work in the beginning, I recommend instance methods over static ones. Among other things, it gives us a constructor method that can be called on instantiation, and it makes testing easier in many cases.
If you like, you may wish to start by converting to static methods, and then later convert the static methods to instance methods, along with all the related method calls. However, your schedule and preferences will dictate which approach you choose.
As I have noted before, this is a tedious, tiresome, and time-consuming process. Depending on the size of the codebase, it may take days or weeks of effort to fully consolidate the classes and functions for autoloading. It would be great if there was some way to automate the process to make it both faster and more reliable.
Unfortunately, I have not yet discovered any tools that make this process easier. As far as I can tell, this kind of refactoring is still best done by hand with strong attention to detail. Having obsessive tendencies and long periods of uninterrupted concentration on this task are likely to be of benefit here.