Common Questions

Should we remove the autoloader include call?

If we placed our autoloader code in a class as a static or instance method, our search for include calls will reveal the inclusion of that class file. If you remove that include call, autoloading will fail, because the class file will not have been loaded. This is a chicken-and-egg problem. The solution is to leave the autoloader include in place as part of our bootstrapping or setup code. If we are fully diligent about removing include calls, that is likely to be the only include remaining in the codebase.

How should we pick files for candidate include calls?

There are several ways to go about this. We could do the following:

We can manually traverse the entire codebase and work file-by-file.
We can generate a list of class and function definition files, and then generate a list of files that include those files.
We can search for every include call and look at the related file to see if it has class or function definitions.

What if an include defines more than one class?

Sometime a class definition file may have more than one class definition in it. This can mess with the autoloading process. If a file named Foo.php defines both Foo and Bar classes, then the Bar class will never be autoloaded, because the file name is wrong.

The solution is to split the single file into multiple files. That is, create one file per class, and name each file for the class it contains per the PSR-0 naming and autoloading expectations.

What if the one-class-per-file rule is disagreeable?

I sometimes hear complaints about how the one-class-per-file rules is somehow wasteful or otherwise not aesthetically pleasing when examining the file system. Isn't it a drag on performance to load that many files? What if some classes are only needed along with some other class, such as an Exception that is only used in one place? I have some responses here:

There is, of course, a performance reduction in loading two files instead of one. The question is how much of a reduction, and compared to what? I assert that, compared to the other more likely performance issues in our legacy application, the drag from loading multiple files is a rounding error. It is more likely that we have other, far greater performance concerns. If it really is a problem, using a bytecode cache like APC will reduce or completely remove these comparatively small performance hits.
Consistency, consistency, consistency. If some of the time a class file has only one class in it, and at other times a class file has more than one class in it, that inconsistency will later become a source of cognitive friction for everyone on the project. One of the main themes through legacy applications is that of inconsistency; let us reduce that inconsistency as much as we can by adhering to the one-class-per-file rule.

If we feel that some classes naturally belong together, it is perfectly acceptable to place the subordinate or child classes in a subdirectory beneath the master or parent class. The subdirectory should be named for that higher class or namespace, per the PSR-0 naming rules.

For example, if we have a series of Exception classes related to a Foo class:

Foo.php                      # class Foo { ... }
Foo/
NotFoundException.php        # class Foo_NotFoundException { ... }
MalformedDataException.php   # class Foo_MalformedDataException { ... }

Renaming classes in this way will change the related class names throughout the codebase where they are instantiated or otherwise referenced.

What if a Class or Function is defined inline?

I have seen cases where a page script has one or more classes or functions defined inside it, generally when the classes or functions are used only by that particular page script.

In these cases, remove the class definitions from the script and place them in their own files in the central class directory location. Be sure to name the files for their class names per the PSR-0 autoloader rules. Similarly, move the function definitions to their own related class file as static methods, and rename the function calls to static method calls.

What if a definition file also executes logic?

I have also seen the opposite case, where a class file has some logic that gets executed as a result of the file being loaded. For example, a class definition file might look like this:

/path/to/foo.php
1 <?php
2 echo "Doing something here ...";
3 log_to_file('a log entry');
4 db_query('UPDATE table_name SET incrementor = incrementor + 1');
5
6 class Foo
7 {
8 // the class
9 }
10 ?>

In the above case, the logic before the class definition will be executed when the file is loaded, even if the class is never instantiated or otherwise called.

This is a much tougher situation to deal with than when classes are defined inline with a page script. The class should be loadable without side effects, and the other logic should be executable without having to load the class.

In general, the easiest way to deal with this is to modify our relocation process. Cut the class definition from the original file and place it in its own file in the central class directory location. Leave the original file with its executable code in place, and leave all the related include calls in place as well. This allows us to pull out the class definition so it can be autoloaded, but scripts that include the original file still get the executable behavior.

For example, given the above combined executable code and class definition, we could end up with these two files:

/path/to/foo.php
1 <?php
2 echo "Doing something here ...";
3 log_to_file('a log entry');
4 db_query('UPDATE table_name SET incrementor = incrementor + 1');
5 ?>

/path/to/app/classes/Foo.php
1 <?php
2 class Foo
3 {
4 // the class
5 }
6 ?>

This is messy, but it preserves the existing application behavior while allowing for autoloading.

What if two classes have the same name?

When we start moving classes around, we may discover that application flow A uses a Foo class, and that application flow B also uses a Foo class, but the two classes of the same name are actually different classes defined in different files. They never conflict with each other because the two different application flows never intersect.

In this case, we have to rename one or both of the classes when we move them to our central class directory location. For example, call one of them FooOne and the other FooTwo, or pick better descriptive names of your own. Place them each in separate class files named for their class names, per the PSR-0 autoloading rules, and rename all references to these classes throughout the codebase.

What about third-party libraries?

When we consolidate our classes and functions, we may find some third-party libraries in the legacy application. We don't want to move or rename the classes and functions in a third-party library, because that would make it too difficult to upgrade the library later. We would have to remember what classes were moved where and which functions were renamed to what.

With any luck, the third-party library uses autoloading of some sort already. If it comes with its own autoloader, we can add that autoloader to the SPL autoloader registry stack in our setup or bootstrap code. If its autoloading is managed by another autoloader system, such as that found in Composer, we can add that autoloader to the SPL autoloader registry stack, again in our setup or bootstrap code.

If the third-party library does not use autoloading, and depends on include calls both in its own code and in the legacy application, we are in a bit of a bind. We don't want to modify the code in the library, but at the same time we want to remove include calls from the legacy application. The two solutions here are least-worst options:

Modify our application's main autoloader to allow for one or more third party libraries
Write an additional autoloader for the third-party library and add it to the SPL autoloader registry stack.

Both of these options are beyond the scope of this book. You will need to examine the library in question, determine its class naming scheme, and come up with appropriate autoloader code on your own.

Finally, in terms of how to organize third-party libraries in the legacy application, it might be wise to consolidate them all to their own central location in the codebase. For example, this might be under a directory called 3rdparty/ or external_libs/. If we move a library, we should move the entire package, not just its class files, so we can upgrade it properly later. This will also allow us to exclude the central third-party directory from our search for include calls so that we don't get extra search results from files that we don't want to modify.

What about system-wide libraries?

System-wide library collections, like those provided by Horde and PEAR, are a special case of third-party libraries. They are generally located on the server file system outside of the legacy application so they can be available to all applications running on that server. The include statements related to these system-wide libraries generally depend on the include_path settings, or else are referenced by absolute path.

These present a special problem when trying to eliminate include calls that only pull in class and function definitions. If we are lucky enough to be using PEAR-installed libraries, we can modify our existing autoloader to look in two directories instead of one. This is because the PSR-0 naming conventions rise out of the Horde/PEAR conventions. The trailing autoloader code changes from this:

1 <?php
2 // convert underscores in the class name to directory separators
3 $subpath .= str_replace('_', DIRECTORY_SEPARATOR, $class);
4
5 // the path to our central class directory location
6 $dir = '/path/to/app/classes'
7
8 // prefix with the central directory location and suffix with .php,
9 // then require it.
10 require $dir . DIRECTORY_SEPARATOR . $subpath . '.php';
11 ?>

To this:

1 <?php
2 // convert underscores in the class name to directory separators
3 $subpath .= str_replace('_', DIRECTORY_SEPARATOR, $class);
4
5 // the paths to our central class directory location and to PEAR
6 $dirs = array('/path/to/app/classes', '/usr/local/pear/php');
7 foreach ($dirs as $dir) {
8 $file = $dir . DIRECTORY_SEPARATOR . $subpath . '.php';
9 if (file_exists($file)) {
10 require $file;
11 }
12 }
13 ?>

For functions, can we use instance methods instead of static methods?

When we consolidated user-defined global functions into classes, we redefined them as static methods. This left their global scope unchanged. If we feel particularly diligent, we can change them from static to instance methods. This involves more work, but in the end it can make testing easier and is a cleaner technical approach. Given our earlier Db example, using instance instead of static methods would look like this:

classes/Db.php
1 <?php
2 class Db
3 {
4 public function query($query_string)
5 {
6 // ... code to perform a query ...
7 }
8
9 public function getRow($query_string)
10 {
11 // ... code to get the first result row
12 }
13
14 public function getCol($query_string)
15 {
16 // ... code to get the first column of results ...
17 }
18 }
19 ?>

The only added step when using instance methods instead of static ones is that we need to instantiate the class before calling its methods. That is, instead of this:

1 <?php
2 Db::query(...);
3 ?>

We would do this:

1 <?php
2 $db = new Db();
3 $db->query(...);
4 ?>

Even though it is more work in the beginning, I recommend instance methods over static ones. Among other things, it gives us a constructor method that can be called on instantiation, and it makes testing easier in many cases.

If you like, you may wish to start by converting to static methods, and then later convert the static methods to instance methods, along with all the related method calls. However, your schedule and preferences will dictate which approach you choose.

Can we automate this process?

As I have noted before, this is a tedious, tiresome, and time-consuming process. Depending on the size of the codebase, it may take days or weeks of effort to fully consolidate the classes and functions for autoloading. It would be great if there was some way to automate the process to make it both faster and more reliable.

Unfortunately, I have not yet discovered any tools that make this process easier. As far as I can tell, this kind of refactoring is still best done by hand with strong attention to detail. Having obsessive tendencies and long periods of uninterrupted concentration on this task are likely to be of benefit here.