Version Control

Note

Programmers

We keep all our project artifacts in a single, authoritative place.

To work as a team, you need some way to coordinate your source code, tests, and other important project artifacts. A version control system provides a central repository that helps coordinate changes to files and also provides a history of changes.

A project without version control may have snippets of code scattered among developer machines, networked drives, and even removable media. The build process may involve one or more people scrambling to find the latest versions of several files, trying to put them in the right places, and only succeeding through the application of copious caffeine, pizza, and stress.

Important

Continuous Integration

A project with version control uses the version control system to mediate changes. It’s an orderly process in which developers get the latest code from the server, do their work, run all the tests to confirm their code works, then check in their changes. This process, called continuous integration, occurs several times a day for each pair.

Note

If you aren’t familiar with the basics of version control, start learning now. Learning to use a version control system effectively may take a few days, but the benefits are so great that it is well worth the effort.

If multiple developers modify the same file without using version control, they’re likely to accidentally overwrite each other’s changes. To avoid this pain, some developers turn to a locking model of version control: when they work on a file, they lock it to prevent anyone else from making changes. The files in their sandboxes are read-only until locked. If you have to check out a file in order to work on it, then you’re using a locking model.

While this approach solves the problem of accidentally overwriting changes, it can cause other, more serious problems. A locking model makes it difficult to make changes. Team members have to carefully coordinate who is working on which file, and that stifles their ability to refactor and make other beneficial changes. To get around this, teams often turn to strong code ownership, which is the worst of the code ownership models because only one person has the authority to modify a particular file. Collective code ownership is a better approach, but it’s very hard to do if you use file locking.

Instead of a locking model, use a concurrent model of version control. This model allows two people to edit the same file simultaneously. The version control system automatically merges their changes—nothing gets overwritten accidentally. If two people edit the exact same lines of code, the version control system prompts them to merge the two lines manually.

Automatic merges may seem risky. They would be risky if it weren’t for continuous integration and the automated build. Continuous integration reduces the scope of merges to a manageable level, and the build, with its comprehensive test suite, confirms that merges work properly.

One of the most powerful uses of a version control system is the ability to go back in time. You can update your sandbox with all the files from a particular point in the past.

This allows you to use diff debugging. When you find a challenging bug that you can’t debug normally, go back in time to an old version of the code when the bug didn’t exist. Then go forward and backward until you isolate the exact check-in that introduced the bug. You can review the changes in that check-in alone to get insight into the cause of the bug. With continuous integration, the number of changes will be small.

Time travel is also useful for reproducing bugs. If somebody reports a bug and you can’t reproduce it, try using the same version of the code that the reporter is using. If you can reproduce the behavior in the old version but not in the current version, especially with a unit test, you can be confident that the bug is and will remain fixed.

It should be obvious that you should store your source code in version control. It’s less obvious that you should store everything else in there, too. Although most version control systems allow you to go back in time, it doesn’t do you any good unless you can build the exact version you had at that time. Storing the whole project in version control—including the build system—gives you the ability to re-create old versions of the project in full.

As much as possible, keep all your tools, libraries, documentation, and everything else related to the project in version control. Tools and libraries are particularly important. If you leave them out, at some point you’ll update one of them, and then you’ll no longer be able to go back to a time before the update. Or, if you do, you’ll have to painstakingly remember which version of the tool you used to use and manually replace it.

For similar reasons, store the entire project in a single repository. Although it may seem natural to split the project into multiple repositories—perhaps one for each deliverable, or one for source code and one for documentation—this approach increases the opportunities for things to get out of sync.

Perform your update and commit actions on the whole tree as well. Typically, this means updating or committing from the top-level directory. It may be tempting to commit only the directory you’ve been working in, but that leaves you vulnerable to the possibility of having your sandbox split across two separate versions.

The only project-related artifact I don’t keep in version control is generated code. Your automated build should re-create generated code automatically.

There is one remaining exception to what belongs in version control: code you plan to throw away. Spike solutions (see Spike Solutions” in Chapter 9), experiments, and research projects may remain unintegrated, unless they produce concrete documentation or other artifacts that will be useful for the project. Check in only the useful pieces of the experiment. Discard the rest.

Customer data should go in the repository, too. That includes documentation, notes on requirements (see Incremental Requirements” in Chapter 9), technical writing such as manuals, and customer tests (see Customer Tests” in Chapter 9).

When I mention this to programmers, they worry that the version control system will be too complex for customers to use. Don’t underestimate your customers. While it’s true that some version control systems are very complex, most have user-friendly interfaces. For example, the TortoiseSvn Windows client for the open-source Subversion version control system is particularly nice.

Even if your version control system is somewhat arcane, you can always create a pair of simple shell scripts or batch files—one for update and one for commit—and teach your customers how to run them. If you sit together, you can always help your customers when they need to do something more sophisticated, such as time travel or merging.

One of the most important ideas in XP is that you keep the code clean and ready to ship. It starts with your sandbox. Although you have to break the build in your sandbox in order to make progress, confine it to your sandbox. Never check in code that breaks the build. This allows anybody to update at any time without worrying about breaking their build—and that, in turn, allows everyone to work smoothly and share changes easily.

Because your build automatically creates a release, any code that builds is theoretically ready to release. In practice, the code may be clean but the software itself won’t be ready for the outside world. Stories will be half-done, user interface elements will be missing, and some things won’t entirely work.

By the end of each iteration, you will have finished all these loose ends. Each story will be “done done,” and you will deploy the software to stakeholders as part of your iteration demo. This software represents a genuine increment of value for your organization. Make sure you can return to it at any time by tagging the tip of the repository. I usually name mine “Iteration X,” where X is the number of the iterations we have conducted.

Not every end-of-iteration release to stakeholders gets released to customers. Although it contains completed stories, it may not have enough to warrant a release. When you conduct an actual release, add another tag to the end-of-iteration build to mark the release. I usually name mine “Release Y,“ where Y is the number of releases we have conducted.

To summarize, your code goes through four levels of completion:

One of the most devastating mistakes a team can make is to duplicate their codebase. It’s easy to do. First, a customer innocently requests a customized version of your software. To deliver this version quickly, it seems simple to duplicate the codebase, make the changes, and ship it. Yet that copy and paste customization doubles the number of lines of code that you need to maintain.

I’ve seen this cripple a team’s ability to deliver working software on a timely schedule. It’s nearly impossible to recombine a duplicated codebase without heroic and immediate action. That one click doesn’t just lead to technical debt; it leads to indentured servitude.

Unfortunately, version control systems actually make this mistake easier to make. Most of these systems provide the option to branch your code—that is, to split the repository into two separate lines of development. This is essentially the same thing as duplicating your codebase.

Branches have their uses, but using them to provide multiple customized versions of your software is risky. Although version control systems provide mechanisms for keeping multiple branches synchronized, doing so is tedious work that steadily becomes more difficult over time. Instead, design your code to support multiple configurations. Use a plug-in architecture, a configuration file, or factor out a common library or framework. Top it off with a build and delivery process that creates multiple versions.

Branches work best when they are short-lived or when you use them for small numbers of changes. If you support old versions of your software, a branch for each version is the best place to put bug fixes and minor enhancements for those versions.

Some teams create a branch in preparation for a release. Half the team continues to perform new work, and the other half attempts to stabilize the old version. In XP, your code shouldn’t require stabilization, so it’s more useful to create such a branch at the point of release, not in preparation for release.

Branches can also be useful for continuous integration and other code management tasks. These private branches live for less than a day. You don’t need private branches to successfully practice XP, but if you’re familiar with this approach, feel free to use it.

Which version control system should I use?

There are plenty of options. In the open source realm, Subversion is popular and particularly good when combined with the TortoiseSvn frontend. Of the proprietary options, Perforce gets good reviews, although I haven’t tried it myself.

Avoid Visual SourceSafe (VSS). VSS is a popular choice for Microsoft teams, but it has numerous flaws and problems with repository corruption—an unacceptable defect in a version control system.

Your organization may already provide a recommended version control system. If it meets your needs, use it. Otherwise, maintaining your own version control system isn’t much work and requires little of a server besides disk space.

Should we really keep all our tools and libraries in version control?

Yes, as much as possible. If you install tools and libraries manually, two undesirable things will happen. First, whenever you make an update, everyone will have to manually update their computer. Second, at some point in the future you’ll want to build an earlier version, and you’ll spend several hours struggling to remember which versions of which tools you need to install.

Some teams address these concerns by creating a “tools and libraries” document and putting it in source control, but it’s a pain to keep such a document up-to-date. Keeping your tools and libraries in source control is a simpler, more effective method.

Some tools and libraries require special installation, particularly on Windows, which makes this strategy more difficult. They don’t all need installation, though—some just come with an installer because it’s a cultural expectation. See if you can use them without installing them, and try to avoid those that you can’t easily use without special configuration.

For tools that require installation, I put their install packages in version control, but I don’t install them automatically in the build script. The same is true for tools that are useful but not necessary for the build, such as IDEs and diff tools.

How can we store our database in version control?

Rather than storing the database itself in version control, set up your build to initialize your database schema and migrate between versions. Store the scripts to do this in version control.

How much of our core platform should we include in version control?

In order for time travel to work, you need to be able to exactly reproduce your build environment for any point in the past. In theory, everything required to build should be in version control, including your compiler, language framework, and even your database management system (DBMS) and operating system (OS). Unfortunately, this isn’t always practical. I include as much as I can, but I don’t usually include my DBMS or operating system.

Some teams keep an image of their entire OS and installed software in version control. This is an intriguing idea, but I haven’t tried it.

With so many things in version control, how can I update as quickly as I need to?

Slow updates may be a sign of a poor-quality version control system. The speed of better systems depends on the number of files that have changed, not the total number of files in the system.

One way to make your updates faster is to be selective about what parts of your tools and libraries you include. Rather than including the entire distribution—documentation, source code, and all—include only the bare minimum needed to build. Many tools only need a handful of files to execute. Include distribution package files in case someone needs more details in the future.

How should we integrate source code from other projects? We have read-only access to their repositories.

If you don’t intend to change their code and you plan on updating infrequently, you can manually copy their source code into your repository.

If you have more sophisticated needs, many version control systems will allow you to integrate with other repositories. Your system will automatically fetch their latest changes when you update. It will even merge your changes to their source code with their updates. Check your version control system’s documentation for more details.

Be cautious of making local changes to third-party source code; this is essentially a branch, and it incurs the same synchronization challenges and maintenance overhead that any long-lived branch does. If you find yourself making modifications beyond vendor-supplied configuration files, consider pushing those changes upstream, back to the vendor, as soon as possible.

We sometimes share code with other teams and departments. Should we give them access to our repository?

Certainly. You may wish to provide read-only access unless you have well-defined ways of coordinating changes from other teams.

With good version control practices, you are easily able to coordinate changes with other members of the team. You easily reproduce old versions of your software when you need to. Long after your project has finished, your organization can recover your code and rebuild it when they need to.

You should always use some form of version control, even on small one-person projects. Version control will act as a backup and protect you when you make sweeping changes.

Concurrent editing, on the other hand, can be dangerous if an automatic merge fails and goes undetected. Be sure you have a decent build if you allow concurrent edits. Concurrent editing is also safer and easier if you practice continuous integration and have good tests.

There is no practical alternative to version control.

You may choose to use file locking rather than concurrent editing. Unfortunately, this approach makes refactoring and collective code ownership very difficult, if not impossible. You can alleviate this somewhat by keeping a list of proposed refactorings and scheduling them, but the added overhead is likely to discourage people from suggesting significant refactorings.

[Mason] is a good introduction to the nuts and bolts of version control that specifically focuses on Subversion.

[Sink], at http://www.ericsink.com/scm/source_control.html, is a helpful introduction to version control for programmers with a Microsoft background.

[Berczuk & Appleton] goes into much more detail about the ways in which to use version control.



[26] Thanks to Andreas Kö for demonstrating this.