Programmers
We keep our code ready to ship.
Most software development efforts have a hidden delay between when the team says “we’re done” and when the software is actually ready to ship. Sometimes that delay can stretch on for months. It’s the little things: merging everyone’s pieces together, creating an installer, prepopulating the database, building the manual, and so forth. Meanwhile, the team gets stressed out because they forgot how long these things take. They rush, leave out helpful build automation, and introduce more bugs and delays.
The ultimate goal is to be able to deploy at any time.
Continuous integration is a better approach. It keeps everybody’s code integrated and builds release infrastructure along with the rest of the application. The ultimate goal of continuous integration is to be able to deploy all but the last few hours of work at any time.
Practically speaking, you won’t actually release software in the middle of an iteration. Stories will be half-done and features will be incomplete. The point is to be technologically ready to release even if you’re not functionally ready to release.
If you’ve ever experienced a painful multiday (or multiweek) integration, integrating every few hours probably seems foolish. Why go through that hell so often?
Actually, short cycles make integration less painful. Shorter cycles lead to smaller changes, which means there are fewer chances for your changes to overlap with someone else’s.
That’s not to say collisions don’t happen. They do. They’re just not very frequent because everybody’s changes are so small.
Collisions are most likely when you’re making wide-ranging changes. When you do, let the rest of the team know beforehand so they can integrate their changes and be ready to deal with yours.
In order to be ready to deploy all but the last few hours of work, your team needs to do two things:
Integrate your code every few hours.
Keep your build, tests, and other release infrastructure up-to-date.
To integrate, update your sandbox with the latest code from the repository, make sure everything builds, then commit your code back to the repository. You can integrate any time you have a successful build. With test-driven development, that should happen every few minutes. I integrate whenever I make a significant change to the code or create something I think the rest of the team will want right away.
Many teams have a rule that you have to integrate before you go home at the end of the day. If you can’t integrate, they say, something has gone wrong and you should throw away your code and start fresh the next day. This rule seems harsh, but it’s actually a very good rule. With test-driven development, if you can’t integrate within a few minutes, you’re likely stuck.
Toss out your recent changes and start over when you get badly stuck.
Each integration should get as close to a real release as possible. The goal is to make preparing for a release such an ordinary occurrence that, when you actually do ship, it’s a nonevent.[27] Some teams that use continuous integration automatically burn an installation CD every time they integrate. Others create a disk image or, for network-deployed products, automatically deploy to staging servers.
When was the last time you spent hours chasing down a bug in your code, only to find that it was a problem with your computer’s configuration or in somebody else’s code? Conversely, when was the last time you spent hours blaming your computer’s configuration (or somebody else’s code), only to find that the problem was in code you just wrote?
On typical projects, when we integrate, we don’t have confidence in the quality of our code or in the quality of the code in the repository. The scope of possible errors is wide; if anything goes wrong, we’re not sure where to look.
Reducing the scope of possible errors is the key to developing quickly. If you have total confidence that your software worked five minutes ago, then only the actions you’ve taken in the last five minutes could cause it to fail now. That reduces the scope of the problem so much that you can often figure it out just by looking at the error message—there’s no debugging necessary.
Agree as a team never to break the build.
To achieve this, agree as a team never to break the build. This is easier than it sounds: you can actually guarantee that the build will never break (well, almost never) by following a little script.
To guarantee an always-working build, you have to solve two problems. First, you need to make sure that what works on your computer will work on anybody’s computer. (How often have you heard the phrase, “But it worked on my machine!”?) Second, you need to make sure nobody gets code that hasn’t been proven to build successfully.
To do this, you need a spare development machine to act as a central integration machine. You also need some sort of physical object to act as an integration token. (I use a rubber chicken. Stuffed toys work well, too.)
With an integration machine and integration token, you can ensure a working build in several simple steps.
Run a full build to make sure everything compiles and passes tests after you get the code. If it doesn’t, something went wrong. The most common problem is a configuration issue on your machine. Try running a build on the integration machine. If it works, debug the problem on your machine. If it doesn’t work, find the previous integrators and beat them about the head and shoulders, if only figuratively.
Update from the repository (follow the previous script). Resolve any integration conflicts and run the build (including tests) to prove that the update worked.
Get the integration token and check in your code.
Go over to the integration machine, get the changes, and run the build (including tests).
Replace the integration token.
If the build fails on the integration machine, you have to fix the problem before you give up the integration token. The fastest way to do so is to roll back your changes. However, if nobody is waiting for the token, you can just fix the problem on your machine and check in again.
Avoid fixing problems manually on the integration machine. If the build worked on your machine, you probably forgot to add a file or a new configuration to the build script. In either case, if you correct the problem manually, the next people to get the code won’t be able to build.
Get the team to agree to continuous integration rather than imposing it on them.
The most important part of adopting continuous integration is getting people to agree to integrate frequently (every few hours) and never to break the build. Agreement is the key to adopting continuous integration because there’s no way to force people not to break the build.
If you’re starting with XP on a brand-new project, continuous integration is easy to do. In the first iteration, install a version control system. Introduce a 10-minute build with the first story, and grow your release infrastructure along with the rest of your application. If you are disciplined about continuing these good habits, you’ll have no trouble using continuous integration throughout your project.
If you’re introducing XP to an existing project, your tests and build may not yet be good enough for continuous integration. Start by automating your build (see Ten-Minute Build” earlier in this chapter), then add tests. Slowly improve your release infrastructure until you can deploy at any time.
The most common problem facing teams practicing continuous integration is slow builds. Whenever possible, keep your build under 10 minutes. On new projects, you should be able to keep your build under 10 minutes all the time. On a legacy project, you may not achieve that goal right away. You can still practice continuous integration, but it comes at a cost.
When you use the integration script discussed earlier, you’re using synchronous integration—you’re confirming that the build and tests succeed before moving on to your next task. If the build is too slow, synchronous integration becomes untenable. (For me, 20 or 30 minutes is too slow.) In this case, you can use asynchronous integration instead. Rather than waiting for the build to complete, start your next task immediately after starting the build, without waiting for the build and tests to succeed.
The biggest problem with asynchronous integration is that it tends to result in broken builds. If you check in code that doesn’t work, you have to interrupt what you’re doing when the build breaks half an hour or an hour later. If anyone else checked out that code in the meantime, their build won’t work either. If the pair that broke the build has gone home or to lunch, someone else has to clean up the mess. In practice, the desire to keep working on the task at hand often overrides the need to fix the build.
If you have a very slow build, asynchronous integration may be your only option. If you must use this, a continuous integration server is the best way to do so. It will keep track of what to build and will automatically notify you when the build has finished.
Switch to synchronous integration when you can.
Over time, continue to improve your build script and tests (see Ten-Minute Build” earlier in this chapter). Once the build time gets down to a reasonable number (15 or 20 minutes), switch to synchronous integration. Continue improving the speed of the build and tests until synchronous integration feels like a pleasant break rather than a waste of time.
Some teams have sophisticated tests, measuring such qualities as performance, load, or stability, that simply cannot finish in under 10 minutes. For these teams, multistage integration is a good idea.
A multistage integration consists of two separate builds. The normal 10-minute build, or commit build, contains all the normal items necessary to prove that the software works: unit tests, integration tests, and a handful of end-to-end tests (see Test-Driven Development” in Chapter 9 for more about these types of tests). This build runs synchronously as usual.
In addition to the regular build, a slower secondary build runs asynchronously. This build contains the additional tests that do not run in a normal build: performance tests, load tests, and stability tests.
Prefer improved tests to a multistage integration.
Although a multistage build is a good idea for a mature project with sophisticated testing, most teams I encounter use multistage integration as a workaround for a slow test suite. I prefer to improve the test suite instead; it’s more valuable to get better feedback more often.
If this is the case for you, a multistage integration might help you transition from asynchronous to synchronous integration. However, although a multistage build is better than completely asynchronous integration, don’t let it stop you from continuing to improve your tests. Switch to fully synchronous integration when you can; only synchronous integration guarantees a known-good build.
I know we’re supposed to integrate at least every four hours, but what if our current story or task takes longer than that?
You can integrate at any time, even when the task or story you’re working on is only partially done. The only requirement is that the code builds and passes its tests.
What should we do while we’re waiting for the integration build to complete?
Take a break. Get a cup of tea. Perform ergonomic stretches. Talk with your partner about design, refactoring opportunities, or next steps. If your build is under 10 minutes, you should have time to clear your head and consider the big picture without feeling like you’re wasting time.
Isn’t asynchronous integration more efficient than synchronous integration?
Although asynchronous integration may seem like a more efficient use of time, in practice it tends to disrupt flow and leads to broken builds. If the build fails, you have to interrupt your new task to roll back and fix the old one. This means you must leave your new task half-done, switch contexts (and sometimes partners) to fix the problem, then switch back. It’s wasteful and annoying.
Instead of switching gears in the middle of a task, many teams let the build remain broken for a few hours while they finish the new task. If other people integrate during this time, the existing failures hide any new failures in their integration. Problems compound and cause a vicious cycle: painful integrations lead to longer broken builds, which lead to more integration problems, which lead to more painful integrations. I’ve seen teams that practice asynchronous integration leave the build broken for days at a time.
Remember, too, that the build should run in under 10 minutes. Given a fast build, the supposed inefficiency of synchronous integration is trivial, especially as you can use that time to reflect on your work and talk about the big picture.
Are you saying that asynchronous integration will never work?
You can make asynchronous integration work if you’re disciplined about keeping the build running fast, checking in frequently, running the build locally before checking in, and fixing problems as soon as they’re discovered. In other words, do all the good things you’re supposed to do with continuous integration.
Synchronous integration makes you confront these issues head on, which is why it’s so valuable. Asynchronous integration, unfortunately, makes it all too easy to ignore slow and broken builds. You don’t have to ignore them, of course, but my experience is that teams using asynchronous integration have slow and broken builds much more often than teams using synchronous integration.
Ron Jeffries said it best:[28]
When I visit clients with asynchronous builds, I see these things happening, I think it’s fair to say invariably:
The “overnight” build breaks at least once when I’m there;
The build lamp goes red at least once when I’m there, and stays that way for more than an hour.
With a synchronous build, once in a while you hear one pair say “Oh, shjt.”
I’m all for more automation. But I think an asynch build is like shutting your eyes right when you drive through the intersection.
Our version control system doesn’t allow us to roll back quickly. What should we do?
The overriding rule of the known-good build is that you must know the build works when you put the integration token back. Usually, that means checking in, running the build on the integration machine, and seeing it pass. Sometimes—we hope not often—it means rolling back your check-in, running the old build, and seeing that pass instead.
If your version control system cannot support this, consider getting one that does. Not being able to revert easily to a known-good point in history is a big danger sign. You need to be able to revert a broken build with as much speed and as little pain as possible so you can get out of the way of other people waiting to integrate. If your version control can’t do this for you, create an automated script that will.
One way to script this is to check out the older version to a temporary sandbox. Delete all the files in the regular sandbox except for the version control system’s metadata files, then copy all the nonmetadata files over from the older version. This will allow you to check in the old version on top of the new one.
We rolled back our check-in, but the build is still failing on the integration machine. What do we do now?
Oops—you’ve almost certainly exposed some sort of configuration bug. It’s possible the bug was in your just-integrated build script, but it’s equally possible there was a latent bug in one of the previous scripts and you accidently exposed it. (Lucky you.)
Either way, the build has to work before you give up the integration token. Now you debug the problem. Enlist the help of the rest of the team if you need to; a broken integration machine is a problem that affects everybody.
Why do we need an integration machine? Can’t we just integrate locally and check in?
In theory, if the build works on your local machine, it should work on any machine. In practice, don’t count on it. The integration machine is a nice, pristine environment that helps prove the build will work anywhere. For example, I occasionally forget to check in a file; watching the build fail on the integration machine when it passed on mine makes my mistake obvious.
Nothing’s perfect, but building on the integration machine does eliminate the majority of cross-machine build problems.
I seem to always run into problems when I integrate. What am I doing wrong?
One cause of integration problems is infrequent integration. The less often you integrate, the more changes you have to merge. Try integrating more often.
Another possibility is that your code tends to overlap with someone else’s. Try talking more about what you’re working on and coordinating more closely with the pairs that are working on related code.
If you’re getting a lot of failures on the integration machine, you probably need to do more local builds before checking in. Run a full build (with tests) before you integrate to make sure your code is OK, then another full build (with tests) afterward to make sure the integrated code is OK. If that build succeeds, you shouldn’t have any problems on the integration machine.
I’m constantly fixing the build when other people break it. How can I get them to take continuous integration seriously?
It’s possible that your teammates haven’t all bought into the idea of continuous integration. I often see teams in which only one or two people have any interest in continuous integration. Sometimes they try to force continuous integration on their teammates, usually by installing a continuous integration server without their consent. It’s no surprise that the team reacts to this sort of behavior by ignoring broken builds. In fact, it may actually decrease their motivation to keep the build running clean.
Talk to the team about continuous integration before trying to adopt it. Discuss the trade-offs as a group, collaboratively, and make a group decision about whether to apply it.
If your team has agreed to use continuous integration but is constantly breaking the build anyway, perhaps you’re using asynchronous integration. Try switching to synchronous integration, and follow the integration script exactly.
When you integrate continuously, releases are a painless event. Your team experiences fewer integration conflicts and confusing integration bugs. The on-site customers see progress in the form of working code as the iteration progesses.
Don’t try to force continuous integration on a group that hasn’t agreed to it. This practice takes everyone’s willful cooperation.
Using continuous integration without a version control system and a 10-minute build is painful.
Synchronous integration becomes frustrating if the build is longer than 10 minutes and too wasteful if the build is very slow. My threshhold is 20 minutes. The best solution is to speed up the build.
A physical integration token only works if all the developers sit together. You can use a continuous integration server or an electronic integration token instead, but be careful to find one that’s as easy to use and as obvious as a physical token.
Integration tokens don’t work at all for very large teams; people spend too much time waiting to integrate. Use private branches in your version control system instead. Check your code into a private branch, build the branch on an integration machine—you can have several—then promote the branch to the mainline if the build succeeds.
If you can’t perform synchronous continuous integration, try using a CI server and asynchronous integration. This will likely lead to more problems than synchronous integration, but it’s the best of the alternatives.
If you don’t have an automated build, you won’t be able to practice asynchronous integration. Delaying integration is a very high-risk activity. Instead, create an automated build as soon as possible, and start practicing one of the forms of continuous integration.
Some teams perform a daily build and smoke test. Continuous integration is a more advanced version of the same practice; if you have a daily build and smoke test, you can migrate to continuous integration. Start with asynchronous integration and steadily improve your build and tests until you can use synchronous integration.
[27] ... except for the release party, of course.
[28] Via the art of agile mailing list, http://tech.groups.yahoo.com/group/art-of-agile/message/365.