Chapter 44. The Evergreen Tree

Jesse Houwing

Imagine you’re working on a product that consists of both software and hardware, and it’s big. Imagine that this product can save lives, but—when used incorrectly—could also kill. This is the reality for many teams working on large hospital appliances.

I’ve worked with one such team working on such a product. Rather, I should say teams. Eighteen teams integrated their work into a single codebase. When I first entered the building that hosted some of these teams, I felt like I was entering a research facility from the computer game Half-Life. People were running around, and computers, pieces of machinery, and old versions of the product were everywhere. At some point, yarn appeared everywhere, an attempt by the teams to visualize the many dependencies.

It takes a lot of discipline to not break things with so many people working on the same product across different buildings and countries; to not accidentally introduce new, unwanted, and unexpected “features.” To keep the code at least consistent, we put a Continuous Integration (CI) system in place. While that ensured that much of the code was covered by unit or integration tests, it’s hard to tell whether it does anything it isn’t supposed to if you don’t have 100% of your code and functionality covered by tests.

We made the dangerous situation of a broken integration immediately visible with some old desktops conveniently placed on filing cabinets in the team rooms and in the hallways. Teams would take immediate ownership if that happened, understanding that 199 colleagues would be blocked because of it. By keeping that discipline and not trying to work around such blockades, many problems were solved quickly, and better ways to work together emerged over time.

One thing that is hard to solve with CI is the collaboration between human beings. We tackled that challenge in line with our CI approach.

Leaving a potential problem to simmer without acting on it is much more expensive than dealing with it here and now. With a single email to a special mailbox, anyone could cause all the screens in the team rooms and hallways to go red, stopping integration and pausing the work of all 199 coworkers. This would trigger an immediate response from all teams, sending a representative to address the possible problem. Literally. Team representatives actually met in a designated place, the issue was reviewed, and action was taken.

And if the assumed problem turned out to be no problem in the end, you would still be rewarded for your courage to act anyway. The few times that there actually was a problem, the consequences could not have been tackled sooner, and the impact was guaranteed to be minimized. Failing to pull the cord and then later saying, “I knew it! I’ve been trying to tell you all along!” would lead to a more difficult conversation about trust and courage.

It takes a lot of discipline to keep your code integrated at all times and all your tests passing. It takes even more discipline and having the right conversations to really integrate your teams.