In the previous chapter, we created the automated testing practices to ensure that developers get fast feedback on the quality of their work. This becomes even more important as we increase the number of developers and the number of branches they work on in version control.
The ability to “branch” in version control systems was created primarily to enable developers to work on different parts of the software system in parallel, without the risk of individual developers checking in changes that could destabilize or introduce errors into trunk (sometimes also called master or mainline).†
However, the longer developers are allowed to work in their branches in isolation, the more difficult it becomes to integrate and merge everyone’s changes back into trunk. In fact, integrating those changes becomes exponentially more difficult as we increase the number of branches and the number of changes in each code branch.
Integration problems result in a significant amount of rework to get back into a deployable state, including conflicting changes that must be manually merged or merges that break our automated or manual tests, usually requiring multiple developers to successfully resolve. And because integration has traditionally been done at the end of the project, when it takes far longer then planned, we are often forced to cut corners to make the release date.
This causes another downward spiral: when merging code is painful, we tend to do it less often, making future merges even worse. Continuous integration was designed to solve this problem by making merging into trunk a part of everyone’s daily work.
The surprising breadth of problems that continuous integration solves, as well as the solutions themselves, are exemplified in Gary Gruver’s experience as the director of engineering for HP’s LaserJet Firmware division, which builds the firmware that runs all their scanners, printers, and multifunction devices.
The team consisted of four hundred developers distributed across the US, Brazil, and India. Despite the size of their team, they were moving far too slowly. For years, they were unable to deliver new features as quickly as the business needed.
Gruver described the problem thusly: “Marketing would come to us with a million ideas to dazzle our customer, and we’d just tell them, ‘Out of your list, pick the two things you’d like to get in the next six to twelve months.’”
They were only completing two firmware releases per year, with the majority of their time spent porting code to support new products. Gruver estimated that only 5% of their time was spent creating new features—the rest of the time was spent on non-productive work associated with their technical debt, such as managing multiple code branches and manual testing, as shown below:
Gruver and his team created a goal of increasing the time spent on innovation and new functionality by a factor of ten. The team hoped this goal could be achieved through:
Before this, each product line would require a new code branch, with each model having a unique firmware build with capabilities defined at compile time.‡ The new architecture would have all developers working in a common code base, with a single firmware release supporting all LaserJet models built off of trunk, with printer capabilities being established at runtime in an XML configuration file.
Four years later, they had one codebase supporting all twenty-four HP LaserJet product lines being developed on trunk. Gruver admits trunk-based development requires a big mindset shift. Engineers thought trunk-based development would never work, but once they started, they couldn’t imagine ever going back. Over the years we’ve had several engineers leave HP, and they would call me to tell me about how backward development was in their new companies, pointing out how difficult it is to be effective and release good code when there is no feedback that continuous integration gives them.
However, trunk-based development required them to build more effective automated testing. Gruver observed, “Without automated testing, continuous integration is the fastest way to get a big pile of junk that never compiles or runs correctly.” In the beginning, a full manual testing cycle required six weeks.
In order to have all firmware builds automatically tested, they invested heavily in their printer simulators and created a testing farm in six weeks—within a few years two thousand printer simulators ran on six racks of servers that would load the firmware builds from their deployment pipeline. Their continuous integration (CI) system ran their entire set of automated unit, acceptance, and integration tests on builds from trunk, just as described in the previous chapter. Furthermore, they created a culture that halted all work anytime a developer broke the deployment pipeline, ensuring that developers quickly brought the system back into a green state.
Automated testing created fast feedback that enabled developers to quickly confirm that their committed code actually worked. Unit tests would run on their workstations in minutes, three levels of automated testing would run on every commit as well as every two and four hours. The final full regression testing would run every twenty-four hours. During this process, they:
This level of productivity could never have been supported prior to adopting continuous integration, when merely creating a green build required days of heroics. The resulting business benefits were astonishing:
What Gruver’s experience shows is that, after comprehensive use of version control, continuous integration is one of the most critical practices that enable the fast flow of work in our value stream, enabling many development teams to independently develop, test, and deliver value. Nevertheless, continuous integration remains a controversial practice. The remainder of this chapter describes the practices required to implement continuous integration, as well as how to overcome common objections.
As described in the previous chapters, whenever changes are introduced into version control that cause our deployment pipeline to fail, we quickly swarm the problem to fix it, bringing our deployment pipeline back into a green state. However, significant problems result when developers work in long-lived private branches (also known as “feature branches”), only merging back into trunk sporadically, resulting in a large batch size of changes. As described in the HP LaserJet example, what results is significant chaos and rework in order to get their code into a releasable state.
Jeff Atwood, founder of the Stack Overflow site and author of the Coding Horror blog, observes that while there are many branching strategies, they can all be put on the following spectrum:
Atwood’s observation is absolutely correct—stated more precisely, the required effort to successfully merge branches back together increases exponentially as the number of branches increase. The problem lies not only in the rework this “merge hell” creates, but also in the delayed feedback we receive from our deployment pipeline. For instance, instead of performance testing against a fully integrated system happening continuously, it will likely happen only at the end of our process.
Furthermore, as we increase the rate of code production as we add more developers, we increase the probability that any given change will impact someone else and increase the number of developers who will be impacted when someone breaks the deployment pipeline.
Here is one last troubling side effect of large batch size merges: when merging is difficult, we become less able and motivated to improve and refactor our code, because refactorings are more likely to cause rework for everyone else. When this happens, we are more reluctant to modify code that has dependencies throughout the codebase, which is (tragically) where we may have the highest payoffs.
This is how Ward Cunningham, developer of the first wiki, first described technical debt: when we do not aggressively refactor our codebase, it becomes more difficult to make changes and to maintain over time, slowing down the rate at which we can add new features. Solving this problem was one of the primary reasons behind the creation of continuous integration and trunk-based development practices, to optimize for team productivity over individual productivity.
Our countermeasure to large batch size merges is to institute continuous integration and trunk-based development practices, where all developers check in their code to trunk at least once per day. Checking code in this frequently reduces our batch size to the work performed by our entire developer team in a single day. The more frequently developers check in their code to trunk, the smaller the batch size and the closer we are to the theoretical ideal of single-piece flow.
Frequent code commits to trunk means we can run all automated tests on our software system as a whole and receive alerts when a change breaks some other part of the application or interferes with the work of another developer. And because we can detect merge problems when they are small, we can correct them faster.
We may even configure our deployment pipeline to reject any commits (e.g., code or environment changes) that take us out of a deployable state. This method is called gated commits, where the deployment pipeline first confirms that the submitted change will successfully merge, build as expected, and pass all the automated tests before actually being merged into trunk. If not, the developer will be notified, allowing corrections to be made without impacting anyone else in the value stream.
The discipline of daily code commits also forces us to break our work down into smaller chunks while still keeping trunk in a working, releasable state. And version control becomes an integral mechanism of how the team communicates with each other—everyone has a better shared understanding of the system, is aware of the state of the deployment pipeline, and can help each other when it breaks. As a result, we achieve higher quality and faster deployment lead times.
Having these practices in place, we can now again modify our definition of “done” (addition in bold text): “At the end of each development interval, we must have integrated, tested, working, and potentially shippable code, demonstrated in a production-like environment, created from trunk using a one-click process, and validated with automated tests.”
Adhering to this revised definition of done helps us further ensure the ongoing testability and deployability of the code we’re producing. By keeping our code in a deployable state, we are able to eliminate the common practice of having a separate test and stabilization phase at the end of the project.
Case Study
Continuous Integration at Bazaarvoice (2012)
Ernest Mueller, who helped engineer the DevOps transformation at National Instruments, later helped transform the development and release processes at Bazaarvoice in 2012. Bazaarvoice supplies customer generated content (e.g., reviews, ratings) for thousands of retailers, such as Best Buy, Nike, and Walmart.
At that time, Bazaarvoice had $120 million in revenue and was preparing for an IPO.§ The business was primarily driven by the Bazaarvoice Conversations application, a monolithic Java application comprised of nearly five million lines of code dating back to 2006, spanning fifteen thousand files. The service ran on 1,200 servers across four data centers and multiple cloud service providers.
Partially as a result of switching to an Agile development process and to two-week development intervals, there was a tremendous desire to increase release frequency from their current ten-week production release schedule. They had also started to decouple parts of their monolithic application, breaking it down into microservices.
Their first attempt at a two-week release schedule was in January of 2012. Mueller observed, “It didn’t go well. It caused massive chaos, with forty-four production incidents filed by our customers. The major reaction from management was basically ‘Let’s not ever do that again.’”
Mueller took over the release processes shortly afterward, with the goal of doing bi-weekly releases without causing customer downtime. The business objectives for releasing more frequently included enabling faster A/B testing (described in upcoming chapters) and increasing the flow of features into production. Mueller identified three core problems:
Mueller concluded that the monolithic Conversations application deployment process needed to be stabilized, which required continuous integration. In the six weeks that followed, developers stopped doing feature work to focus instead on writing automated testing suites, including unit tests in JUnit, regression tests in Selenium, and getting a deployment pipeline running in TeamCity. “By running these tests all the time, we felt like we could make changes with some level of safety. And most importantly, we could immediately find when someone broke something, as opposed to discovering it only after it’s in production.”
They also changed to a trunk/branch release model, where every two weeks they created a new dedicated release branch, with no new commits allowed to that branch unless there was an emergency—all changes would be worked through a sign-off process, either per-ticket or per-team through their internal wiki. That branch would go through a QA process, which would then be promoted into production.
The improvements to predictability and quality of the releases were startling:
Mueller further described how successful this effort was:
We had such success with releases every two weeks, we went to weekly releases, which required almost no changes from the engineering teams. Because releases became so routine, it was as simple as doubling the number of releases on the calendar and releasing when the calendar told us to. Seriously, it was almost a non-event. The majority of changes required were in our customer service and marketing teams, who had to change their processes, such as changing the schedule of their weekly customer emails to make sure customers knew that feature changes were coming. After that, we started working toward our next goals, which eventually led to speeding up our testing times from three plus hours to less than an hour, reducing the number of environments from four to three (Dev, Test, Production, eliminating Staging), and moving to a full continuous delivery model where we enable fast, one-click deployments.
Trunk-based development is likely the most controversial practice discussed in this book. Many engineers will not believe that it’s possible, even those that prefer working uninterrupted on a private branch without having to deal with other developers. However, the data from Puppet Labs’ 2015 State of DevOps Report is clear: trunk-based development predicts higher throughput and better stability, and even higher job satisfaction and lower rates of burnout.
While convincing developers may be difficult at first, once they see the extraordinary benefits, they will likely become lifetime converts, as the HP LaserJet and Bazaarvoice examples illustrate. Continuous integration practices set the stage for the next step, which is automating the deployment process and enabling low-risk releases.